Dylan Bethune-Waddell created TINKERPOP-1315:
------------------------------------------------
Summary: HadoopConfiguration will not allow an ArrayList to be
serialized in vertexProgram configuration unless setProperty is overriden
Key: TINKERPOP-1315
URL: https://issues.apache.org/jira/browse/TINKERPOP-1315
Project: TinkerPop
Issue Type: Improvement
Components: hadoop
Affects Versions: 3.2.1
Reporter: Dylan Bethune-Waddell
Priority: Minor
I have been implementing a "PrecisionBulkLoader" class that takes a
ScriptTraversal with bindings that can execute against the target graph to
getOrCreate vertices/edges with more precision - this follows from my
realization that currently IncrementalBulkLoader will overwrite the first edge
of the same label in the target graph that is between the two vertex endpoints
- this is an issue for self-loops and multi-edges:
https://issues.apache.org/jira/browse/TINKERPOP-1099
I finally got it to work with the script bindings being propagated to workers,
but in order to do so without just taking the last value of the Array I had to
override the setProperty method in
org.apache.tinkerpop.gremlin.hadoop.structure.HadoopConfiguration - before I
did that, when ConfigurationUtils.copy(conf1, conf2) was called with a
HadoopConfiguration on either end (conf1 or conf2), any multi-valued / list
properties get clobbered and only the last value would be there after
storeState/loadState goes through the first cycle in BulkLoaderVertexProgram.
This is something that was bugging me for a while with multiple hosts
configured for TitanGraph in the config and the HadoopConf only opening a
connection against the last host in the list - this change to
HadoopConfiguration causes it to read
standardtitangraph[cassandrathrift:[host1, host2, ...]] in the spark executor
logs instead like you might expect, and allows the bindings for the
ScriptTraversal to survive storeState/loadState and be applied to the traversal.
I suppose I was wondering if this is dangerous or bad somehow? I know that in a
few places I saw the values of the configuration being explicitly
toString()'d...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)