GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168

    TINKERPOP3-1011: HadoopGraph can't re-attach when the InputFormat is not a 
FileInputFormat

    https://issues.apache.org/jira/browse/TINKERPOP3-1011
    
    I really half-assed our `InputRDD` work in 3.1.0. Well, it works, but for 
some providers they have to do stupid work arounds. The reason I never caught 
the problem was that I didn't have a robust test infrastructure for it. I have 
since rectified the situation and more. The `SparkGraphComputer` integration 
tests now choose between using Gryo, GraphSON, or InputRDD as the source data. 
Thus, we are able to test `SparkGraphComptuer` without any communication to 
Hadoop HDFS.
    
    I also solved a long time serialization issue with `WrappedArray` in 
`GryoSerializer`. This makes it so that we can now ALWAYS use `GryoSerializer`. 
I have changed all the respective tests to now use `GryoSerializer` and TADA 
happy happy Kryo.
    
    Finally, `HadoopElementIterator` was always bound to use `FileInputFormat`. 
Why the hell did I leave it like that for so long?! Now ANY `InputFormat` can 
be streamed into Hadoop OLTP including `InputRDDs` via `InputRDDFormat`. 
Ballin'.
    
    Finally finally, small tweak to `BulkLoaderVertexProgramTest` to use 
`target/test-output` as its data storage location (not `/tmp`).
    
    Ran `mvn clean install` and did integration testing on `spark-gremlin/`. 
All is golden.
    
    VOTE +1.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP3-1011

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #168
    
----
commit e954408b6f30f33d17316551c6d014cd63ab83c2
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-01T16:25:14Z

    HadoopElementIterator now supports ANY InputFormat, not just 
FileInputFormat. Sweet. Also, if you are using an RDD in Spark (and thus, not 
really doing Hadoop InputFormat stuffs), we have InputRDDFormat which wraps an 
RDD in an InputFormat so HadoopElementIterator works as well. This solves the 
HadoopGraph OLTP problem for ALL InputFormats and it allows ComputerResultStep 
to Attach elements for more than just FileInputFormats. Good stuff.

commit c374f4ecb1c2c82f911b5fb0a19a66ed663eea60
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-02T19:02:09Z

    I have the SparkIntegrationTestSuite now testing either from Gryo 
FileInputFormat, GraphSON FileInputFormat, or an InputRDD. This gives us super 
coverage and proves that InputRDD (bypassing Hadoop) is working as expected. I 
also fixed up some other tests that used KryoSerializer instead of 
GryoSerializer as I learned how to deal with Scalas WrappedArray class. It was 
insane. This is really good stuff.

commit 311e8abe733995f970d2b1aeb8153584b6ba3024
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-02T19:11:53Z

    Merge branch 'master' into TINKERPOP3-1011

commit e20ff91995f83efd4ebbe9388b0aec1535669ecb
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-02T20:11:29Z

    some organization and clean up. Stuff is lookin SOLID. Time to run full 
integration tests.

commit 91efb28df23fdc0dc99c78bd72159f0478614df1
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-02T22:44:50Z

    GroovyProcessCompiuterSuite was missing GroovyFlatMapTest. Added it. Added 
HadoopPool registration to ToyGraphInputRDD so it doesn't give a WARN message. 
Also I tweaked BulkLoaderVertexProgramTest to use target/test-output/ for its 
intermediary data.

commit 796fe24ae5a4a858b6b301ffa86efc63a2d98758
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-02T22:49:42Z

    GroovyProcessStartard was missing GroovyFlatMapTest. Added.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to