[ https://issues.apache.org/jira/browse/TINKERPOP-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307762#comment-15307762 ]
Marko A. Rodriguez commented on TINKERPOP-1315: ----------------------------------------------- This is not a function of {{HadoopConfiguration}}, but Apache Configuration. Collections are automatically turned into {{Configuration}} arrays. This has been a thorn in my side many times, but we can't just change {{HadoopConfiguration}} to override Apache Configurations expected behavior due to all the other uses of Apache Configuration (like reading/writing from/to a properties file, etc.). > HadoopConfiguration will not allow an ArrayList to be serialized in > vertexProgram configuration unless setProperty is overriden > ------------------------------------------------------------------------------------------------------------------------------- > > Key: TINKERPOP-1315 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1315 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop > Affects Versions: 3.2.1 > Reporter: Dylan Bethune-Waddell > Priority: Minor > > I have been implementing a "PrecisionBulkLoader" class that takes a > ScriptTraversal with bindings that can execute against the target graph to > getOrCreate vertices/edges with more precision - this follows from my > realization that currently IncrementalBulkLoader will overwrite the first > edge of the same label in the target graph that is between the two vertex > endpoints - this is an issue for self-loops and multi-edges: > https://issues.apache.org/jira/browse/TINKERPOP-1099 > I finally got it to work with the script bindings being propagated to > workers, but in order to do so without just taking the last value of the > Array I had to override the setProperty method in > org.apache.tinkerpop.gremlin.hadoop.structure.HadoopConfiguration - before I > did that, when ConfigurationUtils.copy(conf1, conf2) was called with a > HadoopConfiguration on either end (conf1 or conf2), any multi-valued / list > properties get clobbered and only the last value would be there after > storeState/loadState goes through the first cycle in BulkLoaderVertexProgram. > This is something that was bugging me for a while with multiple hosts > configured for TitanGraph in the config and the HadoopConf only opening a > connection against the last host in the list - this change to > HadoopConfiguration causes it to read > standardtitangraph[cassandrathrift:[host1, host2, ...]] in the spark executor > logs instead like you might expect, and allows the bindings for the > ScriptTraversal to survive storeState/loadState and be applied to the > traversal. > I suppose I was wondering if this is dangerous or bad somehow? I know that in > a few places I saw the values of the configuration being explicitly > toString()'d... -- This message was sent by Atlassian JIRA (v6.3.4#6332)