[
https://issues.apache.org/jira/browse/GIRAPH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583491#comment-13583491
]
Eli Reisman commented on GIRAPH-524:
------------------------------------
Good discussion, and good points. I guess that essentially our output formats
can (and do) output the whole resulting graph state at the end of a job, and
can choose to output vertex values, edge weights, simply the graph itself, or
any combo of the above.
Its a great point that regardless of the data this is output at the granularity
of "one vertex worth of graph data per record" but on that same line of
thinking, our output inevitably ends up as semi-structured input data for
another MR job. In this situation, we tell MR what it thinks its seeing
per-record and what to do with it. So its really just MR input data, same as
for any other semi-structured input to a Pig job or whatever.
The Giraph job has produced useful output data, and it doesn't matter to the
next user of that data whether the values in each record are edge weights or
vertex values, its just processing the data in each record for its own
workflow, it just needs to know that each record contains the values that make
it past sanity checks and are formatted right for extracting into data
structures. The way the data was produced in Giraph only mattered to Giraph.
Does this make any sense, or in fact in use do you find that some assumptions
of the "graph structure" of the data as it was produced is expected or utilized
in say a Hive job after the fact, on that same output data? I guess if the data
is going right back into another Giraph job this would be the case.
> Giraph can receive input from vertex or edge-centric data sets; its output is
> graph data, not "vertices"
> --------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-524
> URL: https://issues.apache.org/jira/browse/GIRAPH-524
> Project: Giraph
> Issue Type: Bug
> Components: graph
> Reporter: Eli Reisman
> Priority: Minor
> Fix For: 0.2.0
>
>
> It is silly to have any of our Output format names tied to the "vertex" when
> in fact we are just outputting graph data. The output format names should
> reflect the formatting of the output, and perhaps which elements of the graph
> data you want in the output.
> Lets change those names? Then they get shorter too as a bonus.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira