Dylan Bethune-Waddell created TINKERPOP-1117:
------------------------------------------------

             Summary: InputFormatRDD.readGraphRDD requires a valid 
gremlin.hadoop.inputLocation, breaking InputFormats (Cassandra, HBase) that 
don't need one
                 Key: TINKERPOP-1117
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1117
             Project: TinkerPop
          Issue Type: Improvement
    Affects Versions: 3.2.0-incubating
            Reporter: Dylan Bethune-Waddell
            Priority: Minor
             Fix For: 3.2.0-incubating


On line 43, the call to Constants.getSearchGraphLocation returns 
Optional.empty() if gremlin.hadoop.inputLocation=none as advised in Titan's 
CassandraInputFormat and HBaseInputFormat. Changing the readGraphRDD method to 
call .isPresent() and only set the storage location in the config if so allows 
SparkGraphComputer from the 3.2.0-SNAPSHOT branch to work with Titan via 
CassandraInputFormat in a traversal source:

{code}
// Imports
import java.util.Optional;

@Override
public JavaPairRDD<Object, VertexWritable> readGraphRDD(final Configuration 
configuration, final JavaSparkContext sparkContext) {
    final org.apache.hadoop.conf.Configuration hadoopConfiguration = 
ConfUtil.makeHadoopConfiguration(configuration);
    // This part was used directly in hadoopConfiguration.set(...)
    final Optional<String> searchGraph = 
Constants.getSearchGraphLocation(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION),
 FileSystemStorage.open(hadoopConfiguration));
    if (searchGraph.isPresent()) {
        
hadoopConfiguration.set(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION),
 searchGraph.get());
    }
    return sparkContext.newAPIHadoopRDD(hadoopConfiguration, 
(Class<InputFormat<NullWritable, VertexWritable>>) 
hadoopConfiguration.getClass(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, 
InputFormat.class),
        NullWritable.class,
        VertexWritable.class)
        .mapToPair(tuple -> new Tuple2<>(tuple._2().get().id(), new 
VertexWritable(tuple._2().get())));
{code}

I don't really understand the intended behaviour, so this is probably not the 
right thing to do. Would the addition of a configuration variable such as 
"gremlin.hadoop.inputLocationRequired" that defaults to true, and can be set to 
false for these other input formats work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to