[
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512127#comment-14512127
]
Hitesh Shah commented on TEZ-2368:
----------------------------------
What I meant was that a hashcode would work given that the name is unique. In
any case, why does a dag number need to be exposed to user code? Isn't the
unique id sufficient?
If the end-goal is per-dag data for the framework to be able to clean up code
then the framework should be creating dag specific dirs before passing them to
user land code and cleaning up these dirs when the dag has completed.
External services are not meant to be using these context classes in any case.
Or am I missing something?
We can add this api but I am not sure I see a need for it currently.
> Make the dag number available in Context classes
> ------------------------------------------------
>
> Key: TEZ-2368
> URL: https://issues.apache.org/jira/browse/TEZ-2368
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: TEZ-2368.1.txt
>
>
> Provide the dag number, which is a unique number, for each dag running within
> an application in the TezInputContext, TezOutputContext, TezProcessorContext.
> When containers are re-used, or for external services, this can be used to
> generate intermediate data to a dag specific directory instead of an
> application specific directory, where it becomes difficult to differentiate
> between different dags.
> The DAG name does provide this - but is not suitable for use in a directory
> name. Hashing the name is an option, but can lead to collisions.
> Generating data into a dag specific directory will eventually only be usable
> when we move away from the default MR handler, or enhance it to support an
> additional parameter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)