[jira] [Commented] (TINKERPOP-1315) HadoopConfiguration will not allow an ArrayList to be serialized in vertexProgram configuration unless setProperty is overriden

Marko A. Rodriguez (JIRA) Tue, 31 May 2016 07:00:47 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307762#comment-15307762
 ]


Marko A. Rodriguez commented on TINKERPOP-1315:
-----------------------------------------------

This is not a function of {{HadoopConfiguration}}, but Apache Configuration. 
Collections are automatically turned into {{Configuration}} arrays. This has 
been a thorn in my side many times, but we can't just change 
{{HadoopConfiguration}} to override Apache Configurations expected behavior due 
to all the other uses of Apache Configuration (like reading/writing from/to a 
properties file, etc.).

> HadoopConfiguration will not allow an ArrayList to be serialized in 
> vertexProgram configuration unless setProperty is overriden
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1315
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1315
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.2.1
>            Reporter: Dylan Bethune-Waddell
>            Priority: Minor
>
> I have been implementing a "PrecisionBulkLoader" class that takes a 
> ScriptTraversal with bindings that can execute against the target graph to 
> getOrCreate vertices/edges with more precision - this follows from my 
> realization that currently IncrementalBulkLoader will overwrite the first 
> edge of the same label in the target graph that is between the two vertex 
> endpoints - this is an issue for self-loops and multi-edges:
> https://issues.apache.org/jira/browse/TINKERPOP-1099
> I finally got it to work with the script bindings being propagated to 
> workers, but in order to do so without just taking the last value of the 
> Array I had to override the setProperty method in 
> org.apache.tinkerpop.gremlin.hadoop.structure.HadoopConfiguration - before I 
> did that, when ConfigurationUtils.copy(conf1, conf2) was called with a 
> HadoopConfiguration on either end (conf1 or conf2), any multi-valued / list 
> properties get clobbered and only the last value would be there after 
> storeState/loadState goes through the first cycle in BulkLoaderVertexProgram. 
> This is something that was bugging me for a while with multiple hosts 
> configured for TitanGraph in the config and the HadoopConf only opening a 
> connection against the last host in the list - this change to 
> HadoopConfiguration causes it to read  
> standardtitangraph[cassandrathrift:[host1, host2, ...]] in the spark executor 
> logs instead like you might expect, and allows the bindings for the 
> ScriptTraversal to survive storeState/loadState and be applied to the 
> traversal.
> I suppose I was wondering if this is dangerous or bad somehow? I know that in 
> a few places I saw the values of the configuration being explicitly 
> toString()'d...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP-1315) HadoopConfiguration will not allow an ArrayList to be serialized in vertexProgram configuration unless setProperty is overriden

Reply via email to