[ https://issues.apache.org/jira/browse/TEZ-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494889#comment-15494889 ]
TezQA commented on TEZ-3215: ---------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12828745/TEZ-3215-5.patch against master revision b17edc4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1968//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1968//console This message is automatically generated. > Support for MultipleOutputs > --------------------------- > > Key: TEZ-3215 > URL: https://issues.apache.org/jira/browse/TEZ-3215 > Project: Apache Tez > Issue Type: Improvement > Reporter: Ming Ma > Assignee: Ming Ma > Attachments: TEZ-3215-2.patch, TEZ-3215-3.patch, TEZ-3215-4.patch, > TEZ-3215-5.patch, TEZ-3215.patch > > > Here is the use case. A reducer might write its output to more than one file. > The file name will be based on the mapper key. We don't know all possible > keys ahead of time. In MR, MultipleOutputs provides such support. I couldn't > find anything readily available in Tez. > * Set up one DataSink per file ahead of time won't work as we don't know all > possible keys. > * Use MR MultipleOutputs directly from the Tez application processor. It > isn't clear how to pass TaskInputOutputContext to MultipleOutputs. > * Tez MROutput can create a DataSink based on the specified outputFormat. But > it can't take MR MultipleOutputs. > I end up modifying Tez MROutput with HashMap {{recordWriters}} to achieve > this. If this is a solved problem, can anyone explain how to do it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)