[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960409#comment-15960409 ]
Daniel Dai commented on PIG-5211: --------------------------------- Thanks for the patch, pretty good actually. Several comments: 1. LimitedSortedDataBag shall not extends SortedDataBag, or even DefaultAbstractBag, since it does not use mContents and it does not handle spill. I'd rather to implement DataBag directly, and implements all methods of DataBag. It should be too hard since we don't need to deal with spill. 2. Comparator.reversed only valid in JDK 1.8. We need to make sure Pig compiles under JDK 1.7 as well 3. We need to add a test case not only make sure it uses limited LOSort, also need to make sure it translate to the right physical plan, and it runs and generate right result 4. I am fine NestedLimitOptimizer only deal with limit right after sort currently, we need to create a Jira to deal with operators in the middle though (push limit all the way up, similar to LimitOptimizer) > Optimize Nested Limited Sort > ---------------------------- > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement > Reporter: Jin Sun > Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch, PIG-5211-2.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)