[
https://issues.apache.org/jira/browse/TINKERPOP-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622698#comment-15622698
]
Marko A. Rodriguez commented on TINKERPOP-1118:
-----------------------------------------------
I think we can get rid of the {{VertexWritable}}/{{ObjectWritable}}
serialization issues if we solve this ticket. cc/ [~dalaro]
Right now, {{VertexWritable}} and {{ObjectWritable}} have their own
serialization logic. This is important as these classes are used outside of
just running jobs, but also for reading and writing {{SequenceFiles}}. In
Spark, we don't need to have the RDD use these writables and in fact, can just
directly reference the objects they wrap. In this way, we could have a better
split between {{GryoInput/OutputFormat}} and the internal job serialization
(message passing and the like).
> SparkGraphComputer should use StarGraph, not VertexWritable.
> ------------------------------------------------------------
>
> Key: TINKERPOP-1118
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1118
> Project: TinkerPop
> Issue Type: Improvement
> Components: hadoop
> Affects Versions: 3.1.1-incubating
> Reporter: Marko A. Rodriguez
> Labels: breaking
> Fix For: 3.3.0
>
>
> {{SparkGraphComputer}} input RDDs are typed as:
> {code}
> JavaPairRDD<Object,VertexWritable>
> {code}
> The {{VertexWritable}} usage is a vestige from Hadoop and Giraph. In Spark,
> we don't need to have this wrapper and thus, we can reduce the overhead (one
> less object header) by making the input RDDs typed as:
> {code}
> JavaPairRDD<Object,StarGraph>
> {code}
> This would be a breaking change for graph providers that implement their own
> {{InputRDD}} and {{OutputRDD}}, however, the fix is trivial. Instead of {{new
> VertexWritable(vertex)}}, they would simply do {{StarGraph.of(vertex)}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)