[
https://issues.apache.org/jira/browse/TEZ-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494702#comment-15494702
]
Zhiyuan Yang commented on TEZ-3215:
-----------------------------------
Two more questions:
1. Why do we want to remove this line in MROutput.KeyValueWriter.write? Does
this break TEZ-2918?
{code}
getContext().notifyProgress();
{code}
2. Why do we increment counter by 1 in MROutputs.flush?
{code}
outputRecordCounter.increment(1);
{code}
> Support for MultipleOutputs
> ---------------------------
>
> Key: TEZ-3215
> URL: https://issues.apache.org/jira/browse/TEZ-3215
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Ming Ma
> Assignee: Ming Ma
> Attachments: TEZ-3215-2.patch, TEZ-3215-3.patch, TEZ-3215.patch
>
>
> Here is the use case. A reducer might write its output to more than one file.
> The file name will be based on the mapper key. We don't know all possible
> keys ahead of time. In MR, MultipleOutputs provides such support. I couldn't
> find anything readily available in Tez.
> * Set up one DataSink per file ahead of time won't work as we don't know all
> possible keys.
> * Use MR MultipleOutputs directly from the Tez application processor. It
> isn't clear how to pass TaskInputOutputContext to MultipleOutputs.
> * Tez MROutput can create a DataSink based on the specified outputFormat. But
> it can't take MR MultipleOutputs.
> I end up modifying Tez MROutput with HashMap {{recordWriters}} to achieve
> this. If this is a solved problem, can anyone explain how to do it?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)