[ 
https://issues.apache.org/jira/browse/PIG-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1270:
----------------------------

    Attachment: PIG-1270-2.patch

PIG-1270-2.patch fix all unit tests. However, I didn't see noticeable 
performance improvement. The script I test is:

a = load 'studenttab20m' as (name, age, gpa);
b = limit a 10;
dump b;

Both in local mode and mapreduce mode. 

Need further investigation to find out why performance not improve.

> Push limit into loader
> ----------------------
>
>                 Key: PIG-1270
>                 URL: https://issues.apache.org/jira/browse/PIG-1270
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>         Attachments: PIG-1270-1.patch, PIG-1270-2.patch
>
>
> We can optimize limit operation by stopping early in PigRecordReader. In 
> general, we need a way to communicate between PigRecordReader and execution 
> pipeline. POLimit could instruct PigRecordReader that we have already had 
> enough records and stop feeding more data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to