[ 
https://issues.apache.org/jira/browse/GIRAPH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583491#comment-13583491
 ] 

Eli Reisman commented on GIRAPH-524:
------------------------------------

Good discussion, and good points. I guess that essentially our output formats 
can (and do) output the whole resulting graph state at the end of a job, and 
can choose to output vertex values, edge weights, simply the graph itself, or 
any combo of the above. 

Its a great point that regardless of the data this is output at the granularity 
of "one vertex worth of graph data per record" but on that same line of 
thinking, our output inevitably ends up as semi-structured input data for 
another MR job. In this situation, we tell MR what it thinks its seeing 
per-record and what to do with it. So its really just MR input data, same as 
for any other semi-structured input to a Pig job or whatever.

The Giraph job has produced useful output data, and it doesn't matter to the 
next user of that data whether the values in each record are edge weights or 
vertex values, its just processing the data in each record for its own 
workflow, it just needs to know that each record contains the values that make 
it past sanity checks and are formatted right for extracting into data 
structures. The way the data was produced in Giraph only mattered to Giraph.

Does this make any sense, or in fact in use do you find that some assumptions 
of the "graph structure" of the data as it was produced is expected or utilized 
in say a Hive job after the fact, on that same output data? I guess if the data 
is going right back into another Giraph job this would be the case.

                
> Giraph can receive input from vertex or edge-centric data sets; its output is 
> graph data, not "vertices"
> --------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-524
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-524
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>            Reporter: Eli Reisman
>            Priority: Minor
>             Fix For: 0.2.0
>
>
> It is silly to have any of our Output format names tied to the "vertex" when 
> in fact we are just outputting graph data. The output format names should 
> reflect the formatting of the output, and perhaps which elements of the graph 
> data you want in the output.
> Lets change those names? Then they get shorter too as a bonus.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to