GitHub user okram opened a pull request:
https://github.com/apache/incubator-tinkerpop/pull/168
TINKERPOP3-1011: HadoopGraph can't re-attach when the InputFormat is not a
FileInputFormat
https://issues.apache.org/jira/browse/TINKERPOP3-1011
I really half-assed our `InputRDD` work in 3.1.0. Well, it works, but for
some providers they have to do stupid work arounds. The reason I never caught
the problem was that I didn't have a robust test infrastructure for it. I have
since rectified the situation and more. The `SparkGraphComputer` integration
tests now choose between using Gryo, GraphSON, or InputRDD as the source data.
Thus, we are able to test `SparkGraphComptuer` without any communication to
Hadoop HDFS.
I also solved a long time serialization issue with `WrappedArray` in
`GryoSerializer`. This makes it so that we can now ALWAYS use `GryoSerializer`.
I have changed all the respective tests to now use `GryoSerializer` and TADA
happy happy Kryo.
Finally, `HadoopElementIterator` was always bound to use `FileInputFormat`.
Why the hell did I leave it like that for so long?! Now ANY `InputFormat` can
be streamed into Hadoop OLTP including `InputRDDs` via `InputRDDFormat`.
Ballin'.
Finally finally, small tweak to `BulkLoaderVertexProgramTest` to use
`target/test-output` as its data storage location (not `/tmp`).
Ran `mvn clean install` and did integration testing on `spark-gremlin/`.
All is golden.
VOTE +1.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP3-1011
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-tinkerpop/pull/168.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #168
----
commit e954408b6f30f33d17316551c6d014cd63ab83c2
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-01T16:25:14Z
HadoopElementIterator now supports ANY InputFormat, not just
FileInputFormat. Sweet. Also, if you are using an RDD in Spark (and thus, not
really doing Hadoop InputFormat stuffs), we have InputRDDFormat which wraps an
RDD in an InputFormat so HadoopElementIterator works as well. This solves the
HadoopGraph OLTP problem for ALL InputFormats and it allows ComputerResultStep
to Attach elements for more than just FileInputFormats. Good stuff.
commit c374f4ecb1c2c82f911b5fb0a19a66ed663eea60
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-02T19:02:09Z
I have the SparkIntegrationTestSuite now testing either from Gryo
FileInputFormat, GraphSON FileInputFormat, or an InputRDD. This gives us super
coverage and proves that InputRDD (bypassing Hadoop) is working as expected. I
also fixed up some other tests that used KryoSerializer instead of
GryoSerializer as I learned how to deal with Scalas WrappedArray class. It was
insane. This is really good stuff.
commit 311e8abe733995f970d2b1aeb8153584b6ba3024
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-02T19:11:53Z
Merge branch 'master' into TINKERPOP3-1011
commit e20ff91995f83efd4ebbe9388b0aec1535669ecb
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-02T20:11:29Z
some organization and clean up. Stuff is lookin SOLID. Time to run full
integration tests.
commit 91efb28df23fdc0dc99c78bd72159f0478614df1
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-02T22:44:50Z
GroovyProcessCompiuterSuite was missing GroovyFlatMapTest. Added it. Added
HadoopPool registration to ToyGraphInputRDD so it doesn't give a WARN message.
Also I tweaked BulkLoaderVertexProgramTest to use target/test-output/ for its
intermediary data.
commit 796fe24ae5a4a858b6b301ffa86efc63a2d98758
Author: Marko A. Rodriguez <[email protected]>
Date: 2015-12-02T22:49:42Z
GroovyProcessStartard was missing GroovyFlatMapTest. Added.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---