[
https://issues.apache.org/jira/browse/HIVE-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771638#action_12771638
]
Ning Zhang commented on HIVE-908:
---------------------------------
I agree that for most cases, if the limit number if small, we should reduce the
number of mappers by increasing the split size. This is particularly true when
the limit can be pushed down to the TableScan operator. However if the the
query has joins or group-by, it could be more complicated.
I think a more general solution would be to introduce a limit operator and a
set of rewrite rules to push the limit operator down as much as possible. In
case of reduce-side joins and groupby, we cannot push the limit operator down
to the map side and it has to be on the reduce side. There are techniques that
make join and groupby limit-aware in the top-k query processing techniques (the
ranking function for limit is just a constant function). A survey can be found
at http://www.cs.uwaterloo.ca/~ilyas/papers/IlyasTopkSurvey.pdf.
> optimize limit
> --------------
>
> Key: HIVE-908
> URL: https://issues.apache.org/jira/browse/HIVE-908
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Namit Jain
> Fix For: 0.5.0
>
>
> If there is a limit, all the mappers have to finish and create 'limit' number
> of rows - this can be pretty expensive for a large file.
> The following optimizations can be performed in this area:
> 1. Start fewer mappers if there is a limit - before submitting a job, the
> compiler knows that there is a limit - so, it might be useful to increase the
> split size, thereby reducing the number of mappers.
> 2. A counter is maintained for the total outputs rows - the mappers can look
> at those counters and decide to exit instead of emitting 'limit' number of
> rows themselves.
> 2. may lead to some bugs because of bugs in counters, but 1. should
> definitely help
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.