optimize limit
--------------
Key: HIVE-908
URL: https://issues.apache.org/jira/browse/HIVE-908
Project: Hadoop Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Namit Jain
Fix For: 0.5.0
If there is a limit, all the mappers have to finish and create 'limit' number
of rows - this can be pretty expensive for a large file.
The following optimizations can be performed in this area:
1. Start fewer mappers if there is a limit - before submitting a job, the
compiler knows that there is a limit - so, it might be useful to increase the
split size, thereby reducing the number of mappers.
2. A counter is maintained for the total outputs rows - the mappers can look at
those counters and decide to exit instead of emitting 'limit' number of rows
themselves.
2. may lead to some bugs because of bugs in counters, but 1. should definitely
help
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.