[ 
https://issues.apache.org/jira/browse/TINKERPOP-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045431#comment-15045431
 ] 

ASF GitHub Bot commented on TINKERPOP-1027:
-------------------------------------------

GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/172

    TINKERPOP-1027: Merge view prior to writing graphRDD to output format/rdd

    https://issues.apache.org/jira/browse/TINKERPOP-1027
    
    We had a bug in Spark `graphRDD` writing that showed itself on for 
particular providers. @dalaro provided realized the problem and provided a 
solution. This PR implements @dalaro's recommended fix. This fix also removes 
the need for `reduceByKey()` (though backwards compatible if you do still have 
it) and allowed us to always use `GryoSerialization` with Spark. This is rad. I 
added a few more required serialization registers to `GryoSerialization` and 
all the test cases pass. I also added some more test cases to ensure proper 
functioning.
    
    * Spark integration tests passed.
    * `mvn clean install` passed.
    
    VOTE +1.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1027

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #172
    
----
commit 5c7bc38bdb42ae50243f58a22fc74bc094be6333
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-04T15:36:08Z

    mapReduceRDD makes use of a post view merge. @dalaro realized this was 
important prior to graph writing. Thus, moved the view merge to pre-mapreduce 
and pre-graph output. Added more rigorous property checking to 
PageRankVertexProgramTest. InputFormatRDD and ToyGraphInputRDD no longer 
require reduceByKey() initiation because of merged veiws.

commit e45c293425ed4d9c317b5efbb3a81a9874f7e0e6
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-04T18:14:50Z

    numerous tweaks trying to get things clean and clear. Added more tests to 
PersistedInputOutputRDDTest that show good long chain vertex programs with 
various degrees of Persist and Hadoop OLTP access, etc. Looking good. Still 
BulkLoaderVertexProgram problem with InputRDD... don't know what the problem is 
still (unfortunately).

commit 42bcd89d7cd3d297d958ad22919377e94a149b0e
Author: Marko A. Rodriguez <[email protected]>
Date:   2015-12-07T18:14:29Z

    Merge branch 'TINKERPOP-1025' into TINKERPOP-1027

----


> Merge view prior to writing graphRDD to output format/rdd
> ---------------------------------------------------------
>
>                 Key: TINKERPOP-1027
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1027
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: hadoop
>    Affects Versions: 3.1.0-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Marko A. Rodriguez
>             Fix For: 3.1.1-incubating
>
>
> [~dalaro] noted that DSEGraph was not happy with the current {{graphRDD}} 
> model when it comes to writing. To make it happy, the view merge needs to 
> happen prior to {{graphRDD}} output. Thus, move the {{mapReduceRDD}} view 
> merge to before {{graphRDD}} writing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to