[ 
https://issues.apache.org/jira/browse/TEZ-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078895#comment-14078895
 ] 

Siddharth Seth commented on TEZ-1317:
-------------------------------------

+1 for the additional Descriptor which encapsulates the InputDescriptor and 
InitialzerDescriptor.

I think we have way too many methods to configure MRInput at this point; 
despite that many of the examples end up having to setup the InputDescriptor 
and InputInitializerDescriptor separately (due to grouping being off by 
default, or other non standard configs).
{code}
public static byte[] createUserPayload(Configuration conf, String 
inputFormatClassName, boolean useNewApi, boolean groupSplitsInAM)
public static InputDescriptor createInputDescriptor(Configuration inputConf, 
Class<?> inputFormat, String inputPath)
public static InputDescriptor createInputDescriptor(Configuration inputConf, 
Class<?> inputFormat)
public static DataSourceDescriptor createDataSourceDescriptor(Configuration 
inputConf, Class<?> inputFormat)
public static DataSourceDescriptor createDataSourceDescriptor(Configuration 
inputConf, Class<?> inputFormat, String inputPath)
public static InputInitializerDescriptor createInputInitializerDescriptor()
{code}

For this, I was thinking of doing a builder (similar to the edges). Something 
along the lines of
{code}
DataSourceDescriptor MRInput.configureFileBasedInput(Configuration conf, 
Class<?> inputFormat, String/Path path).addAdditionalInputPath(String/Path 
path).configureGrouping(GROUPING.OFF | GROUPING.AM | GROUPING.CLIENT).done()

DataSourceDescriptor MRInput.configureInput(Configuration conf, Class<?> 
inputFormat).configureGrouping(...).done()
{code}
with sane defaults for enabling grouping etc.
Alternately - separate methods to either return a DataSourceDescriptor or an 
InputDescriptor if that's really required - done returns a 
DataSourceDescriptor, createInputDescriptor would create an InputDescriptor.

File based input formats are the ones which are used most often - hence a 
separate builder for that.
The method to create the descriptor for MRInputAMSplitInitializer can reside in 
that itself.


> Simplify MRinput/MROutput configuration
> ---------------------------------------
>
>                 Key: TEZ-1317
>                 URL: https://issues.apache.org/jira/browse/TEZ-1317
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>            Priority: Blocker
>         Attachments: TEZ-1317.1.patch, TEZ-1317.2.patch
>
>
> Should at least be possible to generate the correct Descriptors.
> Potentially change the addInput / addOutput APIs to accept a single entity 
> which encapsulates InputDescriptor and InputInitializerDescriptor. Similarly 
> for Outputs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to