[ https://issues.apache.org/jira/browse/TINKERPOP-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stephen mallette updated TINKERPOP-1343: ---------------------------------------- Fix Version/s: (was: 3.3.1) > A more efficient StarGraph serialization representation. > -------------------------------------------------------- > > Key: TINKERPOP-1343 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1343 > Project: TinkerPop > Issue Type: Improvement > Components: process > Affects Versions: 3.2.0-incubating > Reporter: Marko A. Rodriguez > Labels: breaking > > {{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a > vertex, its properties, its incident edges, and their properties. In essence, > one "row of an adjacency list." > Here are some ideas on how to make the next version of the serialization > format more efficient. > 1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}. > This is bad because we have to write the class with each id. It would be > better if the {{StarGraph}} had metadata like {{vertexIdClass}}, > {{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are > serializing three class, but the benefit is that every id class is now known > and we can use {{kryo.readObject(..., xxxIdClass)}}. > 2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id, > otherVertexId]\*]\*}} and {{[ propertyKey[ vertexProperty[ > id,propertyValue]\*]\*}}, respectively. This ensures we don't write so many > strings as all edges/vertex properties are grouped by label. However, we do > NOT do this for edge properties nor vertex property properties. We simply > write out the {{Map<Object,Map<String,Object>>}} which is > {{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose > between grouping by edgeId or by propertyKey, we should keep it as it is, but > create a "meta map" that allows us to represent all property keys in a, e.g., > {{int}} space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}} > where we also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized > with the {{StarGraph}}. > There are a few other tickets around optimizing {{StarGraph}} here: > https://issues.apache.org/jira/browse/TINKERPOP-1128 (making {{GraphFilters}} > more efficient) > https://issues.apache.org/jira/browse/TINKERPOP-1122 (pointless bits and > {{StarGraph}} should never auto-generate IDs as the ID space is distributed). > https://issues.apache.org/jira/browse/TINKERPOP-1287 (related to heap usage > and clock cycles -- not serialization). -- This message was sent by Atlassian JIRA (v6.4.14#64029)