[
https://issues.apache.org/jira/browse/PIG-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-4377:
----------------------------
Attachment: PIG-4377-1.patch
Attach a fix.
Here is what happens:
1. Certain key x is sampled (by PoissonSampleLoader/PartitionSkewedKeys) to
have y reduces
2. Actually, only y1 < y records carry key x
3. There are reduce which suppose to get key x does not get row with key x
4. The reduce does not get x will generate redundant empty left relation
(CompilerUtils.addEmptyBagOuterJoin)
What the patch does is:
Only generate empty left relation in the first reduce of key x
> Skewed outer join produce wrong result in some cases
> ----------------------------------------------------
>
> Key: PIG-4377
> URL: https://issues.apache.org/jira/browse/PIG-4377
> Project: Pig
> Issue Type: Bug
> Components: impl
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4377-1.patch, reproduce.patch
>
>
> Skewed outer join produce more than expected rows under certain condition.
> The extra rows contain null left relation. Can be reproduced reliably with
> reproduce.patch (run SkewedJoin_11).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)