Github user graben1437 commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/74#issuecomment-117801830
For the SparkGraphComputer testing:
Set up Spark 1.2.1 (prebuild) in standalone cluster mode with 1 master and
2 workers.
Started the master and the workers and verified they were running via
command line and Spark UI.
On the same node, I built the latest (as of 7/1) TinkerPop3 with the JSR
patch in this pull request in place. Verified that the jsr305 jar was not
found under the distribution:
/home/..../tinkerpop3/incubator-tinkerpop/hadoop-gremlin
find . -name jsr305*
<<nothing returned>>
Without the fix in place the following are the results of the find command:
./hadoop-gremlin/target/hadoop-gremlin-3.0.0-SNAPSHOT-standalone/lib/jsr305-1.3.9.jar
Also grepped the jar files to verify that jsr305 is not packaged:
grep jsr305 *.jar
<< no output>>
The following is the output when the jsr305 is present:
./incubator-tinkerpop/hadoop-gremlin/target
grep jsr305 *.jar
Binary file hadoop-gremlin-3.0.0-SNAPSHOT-job.jar matches
At the very end, after testing, I went to the spark-1.2.1/work directory
and ran the following command to verify that jsr305 was not in the "jar
loads"
being sent to Spark:
find . -name *.jar -exec grep -H jsr305 {} \;
<< returned nothing>>
Next:
Under gremlin-console/target I unzipped
apache-gremlin-console-3.0.0-SNAPSHOT-distribution.zip
cd apache-gremlin-console-3.0.0-SNAPSHOT
vi conf/hadoop-gryo.properties
In that file change:
#spark.master=local[4]
spark.master=spark://machine1.xx.xxx.xxx.com:7077
which is the Spark master indicated by the 1.2.1master started above.
I also copied ./ext/hadoop-gremlin/lib jar files over the ./lib files to
eliminate Spark errors about class serialization.
The following queries were performed to validate the output was correct as
well as checking the Spark master and worker logs to make sure no exceptions
were thrown:
bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph -
HADOOP_GREMLIN_LIBS is set to:
/home/..../tinkerpop3/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.0.0-SNAPSHOT/ext/hadoop-gremlin/lib
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph
graph =
GraphFactory.open('/home/..../tinkerpop3/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.0.0-SNAPSHOT/conf/hadoop/hadoop-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> g=graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat],
sparkgraphcomputer]
gremlin> g.V().count()
==>6
gremlin> g.V().group().by(bothE().count())
==>[1:[v[6], v[5], v[2]], 3:[v[4], v[1], v[3]]]
gremlin> g.V().groupCount('a').by(label).cap('a')
==>[software:2, person:4]
gremlin> g.V().range(0,3)
WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native
library not loaded
==>v[4]
==>v[1]
==>v[6]
Based on this small sample of queries running against a stand alone Spark,
it appears that removing the jsr305.jar from the standalone and/or distribution
jar does not adversely impact use of the SparkGraphComputer functionality.
I will test the GiraphGraphComputer next, assuming this all looks correct
here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---