optimize limit
--------------

                 Key: HIVE-908
                 URL: https://issues.apache.org/jira/browse/HIVE-908
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain
             Fix For: 0.5.0


If there is a limit, all the mappers have to finish and create 'limit' number 
of rows - this can be pretty expensive for a large file.

The following optimizations can be performed in this area:

1. Start fewer mappers if there is a limit - before submitting a job, the 
compiler knows that there is a limit - so, it might be useful to increase the 
split size, thereby reducing the number of mappers.
2. A counter is maintained for the total outputs rows - the mappers can look at 
those counters and decide to exit instead of emitting 'limit' number of rows 
themselves.

2. may lead to some bugs because of bugs in counters, but 1. should definitely 
help

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to