[
https://issues.apache.org/jira/browse/TEZ-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078895#comment-14078895
]
Siddharth Seth commented on TEZ-1317:
-------------------------------------
+1 for the additional Descriptor which encapsulates the InputDescriptor and
InitialzerDescriptor.
I think we have way too many methods to configure MRInput at this point;
despite that many of the examples end up having to setup the InputDescriptor
and InputInitializerDescriptor separately (due to grouping being off by
default, or other non standard configs).
{code}
public static byte[] createUserPayload(Configuration conf, String
inputFormatClassName, boolean useNewApi, boolean groupSplitsInAM)
public static InputDescriptor createInputDescriptor(Configuration inputConf,
Class<?> inputFormat, String inputPath)
public static InputDescriptor createInputDescriptor(Configuration inputConf,
Class<?> inputFormat)
public static DataSourceDescriptor createDataSourceDescriptor(Configuration
inputConf, Class<?> inputFormat)
public static DataSourceDescriptor createDataSourceDescriptor(Configuration
inputConf, Class<?> inputFormat, String inputPath)
public static InputInitializerDescriptor createInputInitializerDescriptor()
{code}
For this, I was thinking of doing a builder (similar to the edges). Something
along the lines of
{code}
DataSourceDescriptor MRInput.configureFileBasedInput(Configuration conf,
Class<?> inputFormat, String/Path path).addAdditionalInputPath(String/Path
path).configureGrouping(GROUPING.OFF | GROUPING.AM | GROUPING.CLIENT).done()
DataSourceDescriptor MRInput.configureInput(Configuration conf, Class<?>
inputFormat).configureGrouping(...).done()
{code}
with sane defaults for enabling grouping etc.
Alternately - separate methods to either return a DataSourceDescriptor or an
InputDescriptor if that's really required - done returns a
DataSourceDescriptor, createInputDescriptor would create an InputDescriptor.
File based input formats are the ones which are used most often - hence a
separate builder for that.
The method to create the descriptor for MRInputAMSplitInitializer can reside in
that itself.
> Simplify MRinput/MROutput configuration
> ---------------------------------------
>
> Key: TEZ-1317
> URL: https://issues.apache.org/jira/browse/TEZ-1317
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Priority: Blocker
> Attachments: TEZ-1317.1.patch, TEZ-1317.2.patch
>
>
> Should at least be possible to generate the correct Descriptors.
> Potentially change the addInput / addOutput APIs to accept a single entity
> which encapsulates InputDescriptor and InputInitializerDescriptor. Similarly
> for Outputs.
--
This message was sent by Atlassian JIRA
(v6.2#6252)