[
https://issues.apache.org/jira/browse/TEZ-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bikas Saha updated TEZ-779:
---------------------------
Attachment: TEZ-779.1.patch
Straightforward refactoring patch.
Takes the grouping code out and provides the grouping methods as separate
classes.
Also makes the task resource and total resource available to users via the
RootInput and VertexManager contexts.
These two items should be enough for anyone to figure out their custom grouping.
Currently the total resource is the same as the DAG's total resource. As an
improvement we could enhance this to divide the total DAG resource among
vertices later on if needed. Its not clear that its a good idea in general.
[~sseth] [~hitesh] please review.
> Make Tez grouped splits logic available outside of InputFormat
> --------------------------------------------------------------
>
> Key: TEZ-779
> URL: https://issues.apache.org/jira/browse/TEZ-779
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-779.1.patch
>
>
> Grouping currently fetches splits from the underlying file format.
> It'd be useful to allow grouping to accept a set of splits instead of always
> fetching them from the underlying format.
> One example of where this will be used : Bucketed Hive data - regular
> HiveInputFormat splits are generated, only splits belonging to the same
> bucket can be Grouped together.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)