[ 
https://issues.apache.org/jira/browse/TEZ-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248608#comment-15248608
 ] 

Hitesh Shah commented on TEZ-3215:
----------------------------------

bq. If dynamic Output addition is being worked on or is already part of the 
future plan to support other scenarios

I dont believe this is a case currently. This would require having a 
VertexManager modify a vertex to add a new data sink before any task can run 
and additionally add committers for them. Could you shed some more light on how 
you would see this being used? 

>From my perspective, if this is just a question of writing to different files 
>on HDFS, a single Output that handles this kind of functionality should likely 
>suffice. Multiple outputs for data sinks may only make sense if the data is 
>being handled in vastly different ways ( different visibility on completion 
>via different committers or if the processor itself needs to know about the 
>different outputs). For intermediate vertices, this would be more complex as a 
>new output would now imply creating a new edge too.   

> Support for MultipleOutputs
> ---------------------------
>
>                 Key: TEZ-3215
>                 URL: https://issues.apache.org/jira/browse/TEZ-3215
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ming Ma
>
> Here is the use case. A reducer might write its output to more than one file. 
> The file name will be based on the mapper key. We don't know all possible 
> keys ahead of time. In MR, MultipleOutputs provides such support. I couldn't 
> find anything readily available in Tez.
> * Set up one DataSink per file ahead of time won't work as we don't know all 
> possible keys.
> * Use MR MultipleOutputs directly from the Tez application processor. It 
> isn't clear how to pass TaskInputOutputContext to MultipleOutputs.
> * Tez MROutput can create a DataSink based on the specified outputFormat. But 
> it can't take MR MultipleOutputs.
> I end up modifying Tez MROutput with HashMap {{recordWriters}} to achieve 
> this. If this is a solved problem, can anyone explain how to do it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to