[jira] Resolved: (HIVE-588) LIMIT n is slower than it needs to be

Namit Jain (JIRA) Wed, 16 Dec 2009 09:18:42 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Namit Jain resolved HIVE-588.
-----------------------------

    Resolution: Duplicate

Duplicate of http://issues.apache.org/jira/browse/HIVE-908

> LIMIT n is slower than it needs to be
> -------------------------------------
>
>                 Key: HIVE-588
>                 URL: https://issues.apache.org/jira/browse/HIVE-588
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Adam Kramer
>
> SELECT a FROM t LIMIT 10;
> ...simply prints the output of the first 10 lines of the first file in the 
> database. That's good.
> However,
> SELECT function(a) FROM t LIMIT 10;
> appears to send all of t to the mappers, runs the function, and and then 
> returns the first 10 rows from whatever mapper(s) finish first. This is very 
> slow in some cases!
> Appropriate behavior for LIMIT would be to use ONE mapper, and to push files 
> from the table into that mapper, and then auto-kill the mapper once it has 
> output 10 rows...just take the first 10 rows and kill the whole task if 
> necessary. On dying, throw some informative error message like, "Dying 
> intentionally; LIMIT has been reached." This should be the case even for 
> TRANSFORMs in the mapper...the TRANSFORM could spit out 20 rows, but once it 
> has split out 10, the whole task should die and the 10 should be returned 
> immediately.
> The purpose of LIMIT is not just to have "only one response," but it's also 
> to speed up queries a whole lot. Running the function over the entire table 
> is a big waste.
> Obviously, when a reduce step is necessary, the whole table will have to be 
> pushed through mappers and then copied and then sorted--but in those cases, 
> whenever 10 total rows have been output by any reducer(s), at which point all 
> reduce tasks should be killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-588) LIMIT n is slower than it needs to be

Reply via email to