GitHub user okram opened a pull request:
https://github.com/apache/incubator-tinkerpop/pull/172
TINKERPOP-1027: Merge view prior to writing graphRDD to output format/rdd
https://issues.apache.org/jira/browse/TINKERPOP-1027
We had a bug in Spark `graphRDD` writing that showed itself on for
particular providers. @dalaro provided realized the problem and provided a
solution. This PR implements @dalaro's recommended fix. This fix also removes
the need for `reduceByKey()` (though backwards compatible if you do still have
it) and allowed us to always use `GryoSerialization` with Spark. This is rad. I
added a few more required serialization registers to `GryoSerialization` and
all the test cases pass. I also added some more test cases to ensure proper
functioning.
* Spark integration tests passed.
* `mvn clean install` passed.
VOTE +1.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1027
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-tinkerpop/pull/172.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #172
----
commit 5c7bc38bdb42ae50243f58a22fc74bc094be6333
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-04T15:36:08Z
mapReduceRDD makes use of a post view merge. @dalaro realized this was
important prior to graph writing. Thus, moved the view merge to pre-mapreduce
and pre-graph output. Added more rigorous property checking to
PageRankVertexProgramTest. InputFormatRDD and ToyGraphInputRDD no longer
require reduceByKey() initiation because of merged veiws.
commit e45c293425ed4d9c317b5efbb3a81a9874f7e0e6
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-04T18:14:50Z
numerous tweaks trying to get things clean and clear. Added more tests to
PersistedInputOutputRDDTest that show good long chain vertex programs with
various degrees of Persist and Hadoop OLTP access, etc. Looking good. Still
BulkLoaderVertexProgram problem with InputRDD... don't know what the problem is
still (unfortunately).
commit 42bcd89d7cd3d297d958ad22919377e94a149b0e
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-07T18:14:29Z
Merge branch 'TINKERPOP-1025' into TINKERPOP-1027
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---