[
https://issues.apache.org/jira/browse/TEZ-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082972#comment-14082972
]
Siddharth Seth commented on TEZ-1317:
-------------------------------------
Comments.
- Should MRInputConfigurer be a proper builder, so that when create is called
it returns an instance which can then be used to create the relevant
DataSourceDescriptor. Mainly for methods like getCredentials which must be
called only after create is invoked.
- The credentials additional is really useful. It requires users to create an
actual instance of the Configurer. Configurer c; addInput(c.create()).
credentials.add(c.getCredentials). If we could change this to somehow add the
Credentials directly, that'll be awesome. One possible way to do that is to add
Credentials to the DataSource/DataSinkDescriptor - which can then be accessed
during DAG constructions. Simplifies usage quite a bit, since this API will
have to be used if a job is written to run on a secure cluster.
- The Output should likely be using the same pattern. Credentials apply to the
output as well.
- On the Input, addInputPaths - is this expected to be a CSV string. There was
a jira on Hadoop to accept this as a list - which is likely more useful. For
now, I think this is good - and we can add an API later if required. Should
probably be renamed to setInputsPaths.
- The exception message can be confusing if using a custom input format which
accepts paths but is not a FileInputFormat. The message could explicitly say -
"Only supported for FileInputFormat, configure custom file based InputFormats
directly in the Configuration"
- s/getConfigurer/creteConfigurer && s/create()/configure() ?
> Simplify MRinput/MROutput configuration
> ---------------------------------------
>
> Key: TEZ-1317
> URL: https://issues.apache.org/jira/browse/TEZ-1317
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Priority: Blocker
> Attachments: TEZ-1317.1.patch, TEZ-1317.2.patch, TEZ-1317.3.patch,
> TEZ-1317.3.patch, TEZ-1317.4.patch
>
>
> Should at least be possible to generate the correct Descriptors.
> Potentially change the addInput / addOutput APIs to accept a single entity
> which encapsulates InputDescriptor and InputInitializerDescriptor. Similarly
> for Outputs.
--
This message was sent by Atlassian JIRA
(v6.2#6252)