[ https://issues.apache.org/jira/browse/TINKERPOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stephen mallette updated TINKERPOP-1117: ---------------------------------------- Component/s: hadoop > InputFormatRDD.readGraphRDD requires a valid gremlin.hadoop.inputLocation, > breaking InputFormats (Cassandra, HBase) that don't need one > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: TINKERPOP-1117 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1117 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop > Affects Versions: 3.2.0-incubating > Reporter: Dylan Bethune-Waddell > Priority: Minor > Fix For: 3.2.0-incubating > > > On line 43, the call to Constants.getSearchGraphLocation returns > Optional.empty() if gremlin.hadoop.inputLocation=none as advised in Titan's > CassandraInputFormat and HBaseInputFormat. Changing the readGraphRDD method > to call .isPresent() and only set the storage location in the config if so > allows SparkGraphComputer from the 3.2.0-SNAPSHOT branch to work with Titan > via CassandraInputFormat in a traversal source: > {code} > // Imports > import java.util.Optional; > @Override > public JavaPairRDD<Object, VertexWritable> readGraphRDD(final Configuration > configuration, final JavaSparkContext sparkContext) { > final org.apache.hadoop.conf.Configuration hadoopConfiguration = > ConfUtil.makeHadoopConfiguration(configuration); > // This part was used directly in hadoopConfiguration.set(...) > final Optional<String> searchGraph = > Constants.getSearchGraphLocation(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION), > FileSystemStorage.open(hadoopConfiguration)); > if (searchGraph.isPresent()) { > > hadoopConfiguration.set(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION), > searchGraph.get()); > } > return sparkContext.newAPIHadoopRDD(hadoopConfiguration, > (Class<InputFormat<NullWritable, VertexWritable>>) > hadoopConfiguration.getClass(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, > InputFormat.class), > NullWritable.class, > VertexWritable.class) > .mapToPair(tuple -> new Tuple2<>(tuple._2().get().id(), new > VertexWritable(tuple._2().get()))); > {code} > I don't really understand the intended behaviour, so this is probably not the > right thing to do. Would the addition of a configuration variable such as > "gremlin.hadoop.inputLocationRequired" that defaults to true, and can be set > to false for these other input formats work? -- This message was sent by Atlassian JIRA (v6.3.4#6332)