[ https://issues.apache.org/jira/browse/PIG-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400602#comment-13400602 ]
Bill Graham commented on PIG-483: --------------------------------- Good point. In the current design the job graph is assumed to be immutable. Here are a few options we can consider: 1. Change that design to allow modification of the graph at runtime. Ambrose would need to adapt. Ambrose aside, this would probably produce additional complexity to other parts of the Pig execution engine that would need to be worked out. 2. We could introduce a notion of a skipped job, i.e. one that's been optimized out. This would work in this situation, but wouldn't work if we have future optimizations that add jobs (i.e., auto-detecting skew and changing to a skew join). Can anyone comment on the expected complexity of adapting the physical plan to accomodate either of these approached? > PERFORMANCE: different strategies for large and small order bys > --------------------------------------------------------------- > > Key: PIG-483 > URL: https://issues.apache.org/jira/browse/PIG-483 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.2.0 > Reporter: Olga Natkovich > Labels: gsoc2011, performance > > Currently pig always does a multi-pass order by where it first determines a > distribution for the keys and then orders in a second pass. This avoids the > necessity of having a single reducer. However, in cases where the data is > small enough to fit into a single reducer, this is inefficient. For small > data sets it would be good to realize the small size of the set and do the > order by in a single pass with a single reducer. > This is a candidate project for Google summer of code 2011. More information > about the program can be found at http://wiki.apache.org/pig/GSoc2011 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira