[ 
https://issues.apache.org/jira/browse/TEZ-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082972#comment-14082972
 ] 

Siddharth Seth commented on TEZ-1317:
-------------------------------------

Comments.
- Should MRInputConfigurer be a proper builder, so that when create is called 
it returns an instance which can then be used to create the relevant 
DataSourceDescriptor. Mainly for methods like getCredentials which must be 
called only after create is invoked.
- The credentials additional is really useful. It requires users to create an 
actual instance of the Configurer. Configurer c; addInput(c.create()). 
credentials.add(c.getCredentials). If we could change this to somehow add the 
Credentials directly, that'll be awesome. One possible way to do that is to add 
Credentials to the DataSource/DataSinkDescriptor - which can then be accessed 
during DAG constructions. Simplifies usage quite a bit, since this API will 
have to be used if a job is written to run on a secure cluster.
- The Output should likely be using the same pattern. Credentials apply to the 
output as well.
- On the Input, addInputPaths - is this expected to be a CSV string. There was 
a jira on Hadoop to accept this as a list - which is likely more useful. For 
now, I think this is good - and we can add an API later if required. Should 
probably be renamed to setInputsPaths.
- The exception message can be confusing if using a custom input format which 
accepts paths but is not a FileInputFormat. The message could explicitly say - 
"Only supported for FileInputFormat, configure custom file based InputFormats 
directly in the Configuration"
- s/getConfigurer/creteConfigurer && s/create()/configure() ?

> Simplify MRinput/MROutput configuration
> ---------------------------------------
>
>                 Key: TEZ-1317
>                 URL: https://issues.apache.org/jira/browse/TEZ-1317
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>            Priority: Blocker
>         Attachments: TEZ-1317.1.patch, TEZ-1317.2.patch, TEZ-1317.3.patch, 
> TEZ-1317.3.patch, TEZ-1317.4.patch
>
>
> Should at least be possible to generate the correct Descriptors.
> Potentially change the addInput / addOutput APIs to accept a single entity 
> which encapsulates InputDescriptor and InputInitializerDescriptor. Similarly 
> for Outputs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to