[ 
https://issues.apache.org/jira/browse/PIG-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281205#comment-13281205
 ] 

Jeff Lord commented on PIG-2690:
--------------------------------

Daniel,

More than happy to.
Though I have looked at the current docs and they appear to be accurate.

"You will also see better performance if the data in the left table is 
partitioned evenly across part files (no significant skew and each part file 
contains at least one full block of data)."
http://pig.apache.org/docs/r0.10.0/perf.html#merge-joins

Please let me know if there is something i may have overlooked.

-J
                
> Pig Documentation regarding Merge Join is confusing
> ---------------------------------------------------
>
>                 Key: PIG-2690
>                 URL: https://issues.apache.org/jira/browse/PIG-2690
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation, site
>    Affects Versions: 0.7.0, 0.8.1
>            Reporter: Jeff Lord
>              Labels: docuentation
>         Attachments: fixDocs_0.patch
>
>
> The Documentation regarding merge join in pig is a bit off.
> http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Merge+Joins
> "For optimal performance, each part file of the left (sorted) input of the 
> join should have a size of at least 1 hdfs block size (for example if the 
> hdfs block size is 128 MB, each part file should be less than 128 MB). If the 
> total input size (including all part files) is greater than blocksize, then 
> the part files should be uniform in size (without large skews in sizes)."
> This is confusing and should read something more akin to this:
> http://wiki.apache.org/pig/PigMergeJoin
> For optimal performance, each part file of the left (sorted) input of the 
> join should have a size of at least 1 hdfs block size (for example if the 
> hdfs block size is 128 MB, each part file should be > 128 MB). If the total 
> input size (including all part files) is < a blocksize, then the part files 
> should be uniform in size (without large skews in sizes). The main idea is to 
> eliminate skew in the amount of input the final map job performing the 
> merge-join will process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to