[ 
https://issues.apache.org/jira/browse/TINKERPOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stephen mallette updated TINKERPOP-1117:
----------------------------------------
    Component/s: hadoop

> InputFormatRDD.readGraphRDD requires a valid gremlin.hadoop.inputLocation, 
> breaking InputFormats (Cassandra, HBase) that don't need one
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1117
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1117
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.2.0-incubating
>            Reporter: Dylan Bethune-Waddell
>            Priority: Minor
>             Fix For: 3.2.0-incubating
>
>
> On line 43, the call to Constants.getSearchGraphLocation returns 
> Optional.empty() if gremlin.hadoop.inputLocation=none as advised in Titan's 
> CassandraInputFormat and HBaseInputFormat. Changing the readGraphRDD method 
> to call .isPresent() and only set the storage location in the config if so 
> allows SparkGraphComputer from the 3.2.0-SNAPSHOT branch to work with Titan 
> via CassandraInputFormat in a traversal source:
> {code}
> // Imports
> import java.util.Optional;
> @Override
> public JavaPairRDD<Object, VertexWritable> readGraphRDD(final Configuration 
> configuration, final JavaSparkContext sparkContext) {
>     final org.apache.hadoop.conf.Configuration hadoopConfiguration = 
> ConfUtil.makeHadoopConfiguration(configuration);
>     // This part was used directly in hadoopConfiguration.set(...)
>     final Optional<String> searchGraph = 
> Constants.getSearchGraphLocation(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION),
>  FileSystemStorage.open(hadoopConfiguration));
>     if (searchGraph.isPresent()) {
>         
> hadoopConfiguration.set(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION),
>  searchGraph.get());
>     }
>     return sparkContext.newAPIHadoopRDD(hadoopConfiguration, 
> (Class<InputFormat<NullWritable, VertexWritable>>) 
> hadoopConfiguration.getClass(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, 
> InputFormat.class),
>         NullWritable.class,
>         VertexWritable.class)
>         .mapToPair(tuple -> new Tuple2<>(tuple._2().get().id(), new 
> VertexWritable(tuple._2().get())));
> {code}
> I don't really understand the intended behaviour, so this is probably not the 
> right thing to do. Would the addition of a configuration variable such as 
> "gremlin.hadoop.inputLocationRequired" that defaults to true, and can be set 
> to false for these other input formats work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to