Github user graben1437 commented on the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/74#issuecomment-117801830
  
    For the SparkGraphComputer testing:
    
    Set up Spark 1.2.1 (prebuild) in standalone cluster mode with 1 master and 
2 workers.  
    Started the master and the workers and verified they were running via 
command line and Spark UI.
    
    On the same node, I built the latest (as of 7/1) TinkerPop3 with the JSR 
patch in this pull request in place.  Verified that the jsr305 jar was not 
found under the distribution:
    /home/..../tinkerpop3/incubator-tinkerpop/hadoop-gremlin
    find . -name jsr305*
    <<nothing returned>>
    Without the fix in place the following are the results of the find command:
    
./hadoop-gremlin/target/hadoop-gremlin-3.0.0-SNAPSHOT-standalone/lib/jsr305-1.3.9.jar
    
    Also grepped the jar files to verify that jsr305 is not packaged:
    grep jsr305 *.jar
    << no output>>
    
    The following is the output when the jsr305 is present:
    ./incubator-tinkerpop/hadoop-gremlin/target
    grep jsr305 *.jar
    Binary file hadoop-gremlin-3.0.0-SNAPSHOT-job.jar matches
    
    At the very end, after testing, I went to the spark-1.2.1/work directory
    and ran the following command to verify that jsr305 was not in the "jar 
loads"
    being sent to Spark:
    find . -name *.jar -exec grep -H jsr305 {} \;
    << returned nothing>>
    
    Next:
    Under gremlin-console/target I unzipped 
apache-gremlin-console-3.0.0-SNAPSHOT-distribution.zip
    cd apache-gremlin-console-3.0.0-SNAPSHOT
    vi conf/hadoop-gryo.properties 
    In that file change:
    #spark.master=local[4]
    spark.master=spark://machine1.xx.xxx.xxx.com:7077 
    which is the Spark master indicated by the  1.2.1master started above.
    
    I also copied ./ext/hadoop-gremlin/lib jar files over the ./lib files to 
eliminate Spark errors about class serialization.
    
    The following queries were performed to validate the output was correct as 
well as checking the Spark master and worker logs to make sure no exceptions 
were thrown:
    
    bin/gremlin.sh
    
             \,,,/
             (o o)
    -----oOOo-(3)-oOOo-----
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.utilities
    INFO  org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph  - 
HADOOP_GREMLIN_LIBS is set to: 
/home/..../tinkerpop3/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.0.0-SNAPSHOT/ext/hadoop-gremlin/lib
    plugin activated: tinkerpop.hadoop
    plugin activated: tinkerpop.tinkergraph
    graph = 
GraphFactory.open('/home/..../tinkerpop3/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.0.0-SNAPSHOT/conf/hadoop/hadoop-gryo.properties')
    ==>hadoopgraph[gryoinputformat->gryooutputformat]
    gremlin> g=graph.traversal(computer(SparkGraphComputer))
    ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], 
sparkgraphcomputer]
    gremlin> g.V().count()
    ==>6
    gremlin> g.V().group().by(bothE().count()) 
    ==>[1:[v[6], v[5], v[2]], 3:[v[4], v[1], v[3]]]
    gremlin> g.V().groupCount('a').by(label).cap('a')
    ==>[software:2, person:4]
    gremlin> g.V().range(0,3)
    WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
    WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy  - Snappy native 
library not loaded
    ==>v[4]
    ==>v[1]
    ==>v[6]
    
    Based on this small sample of queries running against a stand alone Spark, 
it appears that removing the jsr305.jar from the standalone and/or distribution 
jar does not adversely impact use of the SparkGraphComputer functionality.
    
    I will test the GiraphGraphComputer next, assuming this all looks correct 
here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to