[
https://issues.apache.org/jira/browse/TEZ-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150273#comment-17150273
]
László Bodor commented on TEZ-2103:
-----------------------------------
[~srahman]: I don't have the context, but 1 quick comment: tez codebase cannot
contain hive-related stuff, e.g.:
{code}
vertex.numRows += task.getCounters().findCounter("HIVE",
"RECORDS_OUT_0").getValue();
{code}
> Implement a Partial completion VertexManagerPlugin
> --------------------------------------------------
>
> Key: TEZ-2103
> URL: https://issues.apache.org/jira/browse/TEZ-2103
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Gopal Vijayaraghavan
> Priority: Major
> Labels: gsoc, gsoc2015, hadoop, java, tez
> Attachments: TEZ-2103.01.patch, TEZ-2103.WIP.patch
>
>
> Currently, there is no sibling communication between tasks - this implies
> that a task can be completed by the first vertex in a wave of tasks, but the
> entire wave of tasks has to complete before success can be reported.
> This occurs in limit + filter query patterns common between the data access
> engines.
> {code}
> select * from data where x > 1 limit 10;
> {code}
> will run through a full-table scan worth of tasks to generate 10 rows per
> task, to aggregate it to produce the final 10 row result.
> The VertexManager receives counters/events early enough to short-circuit the
> rest of the vertex tasks, to prevent the remainder of tasks from getting
> scheduled when the limit condition has been satisfied by an initial sub-set
> of the tasks.
> This is a specialization of the VertexManagerPlugin for this common case
> scheduling pattern.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)