[jira] [Commented] (GIRAPH-1000) Multi Output support

Lukas Nalezenec (JIRA) Wed, 25 Mar 2015 04:23:56 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379707#comment-14379707
 ]


Lukas Nalezenec commented on GIRAPH-1000:
-----------------------------------------

I have never used Hadoop MultipleOutputs - I evaluated it when it was new but 
it was hard to unit test. We have decided to replace it in MapReduce by our own 
internal implementation. In my humble opinion MultipleOutputs are badly 
designed. Just my two cents.

I think there is not much documentation on Giraph internals. You have to read 
source code. The code is well written and you will learn a lot. I don know much 
about these parts of Giraph but if I will know i will help you.

> Multi Output support
> --------------------
>
>                 Key: GIRAPH-1000
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1000
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp, conf and scripts, graph
>    Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT
>            Reporter: Alessio Arleo
>              Labels: features
>
> Hadoop natively supports multiple outputs. The objective is to extend Giraph 
> to support multiple output formats during a single giraph run.
> According to the official Hadoop apidocs*, to take advantage of multiple 
> outputs the  the pattern is the following:
> - Modify the job submission
> - Modify the reducer class to write on the declared different outputs
> Since Giraph jobs are executed as mappers, probably this approach (or at 
> least its second part) is not feasible, so further investigation is necessary.
> *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (GIRAPH-1000) Multi Output support

Reply via email to