[
https://issues.apache.org/jira/browse/PIG-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205194#comment-13205194
]
Viraj Bhat commented on PIG-1270:
---------------------------------
What version is this likely going to be fixed?
Daniel in your original comment the script mentioned is similar to a "SELECT *
.. LIMIT 10" Hive currently does not run a M/R job for these situations, it
just reads the data and streams it to stdout. Can we do such an optimization
for the query mentioned?
Additionally can we use some optimizations that Hadoop 23 has such as running
in an Uberized task rather than launch M/R jobs?
Viraj
> Push limit into loader
> ----------------------
>
> Key: PIG-1270
> URL: https://issues.apache.org/jira/browse/PIG-1270
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.7.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Attachments: PIG-1270-1.patch, PIG-1270-2.patch, PIG-1270-3.patch
>
>
> We can optimize limit operation by stopping early in PigRecordReader. In
> general, we need a way to communicate between PigRecordReader and execution
> pipeline. POLimit could instruct PigRecordReader that we have already had
> enough records and stop feeding more data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira