[ 
https://issues.apache.org/jira/browse/PIG-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755776#comment-15755776
 ] 

Rohini Palaniswamy commented on PIG-3417:
-----------------------------------------

Comments:
   1) 
bq. PartitionSkewedKeys would work on ((key1, key2, ...), (tuple mem size, key 
count)) format for composite keys, and on (key, (tuple mem size, key count)) 
format for non-composite key.
Shouldn't it be ((key1, key2, ...), tuple mem size, key count) and (key, tuple 
mem size, key count). Don't see why we need to have tuple mem size and key 
count in a tuple. i.e Instead of going from New For Each(true,true)\[tuple\] to 
New For Each(false,false)\[tuple\], you can do New For 
Each(false,true)\[tuple\] so that the key is not flattened, but stats is 
flattened. This will avoid unnecessary increase in size of the sampling data. 
This will also reduce the number of changes needed in your patch.
 2) TestTezCompiler/TestMRCompiler which compare plans generated should be 
failing as the plan has changed. Golden files will have to be changed. You can 
modify generate = true in test class to easily change them. 
  3) testSkewJoinWithTuples - Please assert the actual output and not just the 
size. Would be good to have a e2e test added as well for this case.

> Skewed Join On Tuple Column Kills Job 
> --------------------------------------
>
>                 Key: PIG-3417
>                 URL: https://issues.apache.org/jira/browse/PIG-3417
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.11.1
>            Reporter: Nick White
>            Assignee: Nandor Kollar
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: PIG-3417.patch, TestSkewJoinWithTuples.java
>
>
> I've attached a test case that fails, but should pass. The test case groups 
> two relations separately, then full-outer joins them on the grouped columns. 
> The test case passes if "using 'skewed'" is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to