[ 
https://issues.apache.org/jira/browse/TINKERPOP-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stephen mallette updated TINKERPOP-1343:
----------------------------------------
    Fix Version/s:     (was: 3.3.1)

> A more efficient StarGraph serialization representation.
> --------------------------------------------------------
>
>                 Key: TINKERPOP-1343
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1343
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.2.0-incubating
>            Reporter: Marko A. Rodriguez
>              Labels: breaking
>
> {{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a 
> vertex, its properties, its incident edges, and their properties. In essence, 
> one "row of an adjacency list."
> Here are some ideas on how to make the next version of the serialization 
> format more efficient.
> 1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}. 
> This is bad because we have to write the class with each id. It would be 
> better if the {{StarGraph}} had metadata like {{vertexIdClass}}, 
> {{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are 
> serializing three class, but the benefit is that every id class is now known 
> and we can use {{kryo.readObject(..., xxxIdClass)}}.
> 2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id, 
> otherVertexId]\*]\*}} and {{[ propertyKey[ vertexProperty[ 
> id,propertyValue]\*]\*}}, respectively. This ensures we don't write so many 
> strings as all edges/vertex properties are grouped by label. However, we do 
> NOT do this for edge properties nor vertex property properties. We simply 
> write out the {{Map<Object,Map<String,Object>>}} which is 
> {{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose 
> between grouping by edgeId or by propertyKey, we should keep it as it is, but 
> create a "meta map" that allows us to represent all property keys in a, e.g., 
> {{int}} space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}} 
> where we also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized 
> with the {{StarGraph}}.
> There are a few other tickets around optimizing {{StarGraph}} here:
> https://issues.apache.org/jira/browse/TINKERPOP-1128 (making {{GraphFilters}} 
> more efficient)
> https://issues.apache.org/jira/browse/TINKERPOP-1122 (pointless bits and 
> {{StarGraph}} should never auto-generate IDs as the ID space is distributed).
> https://issues.apache.org/jira/browse/TINKERPOP-1287 (related to heap usage 
> and clock cycles -- not serialization).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to