GitHub user spmallette opened a pull request:

    https://github.com/apache/tinkerpop/pull/613

    TINKERPOP-1676 (tp32) GraphSON Performance

    https://issues.apache.org/jira/browse/TINKERPOP-1676
    
    Made some pretty good improvements to performance. I was mostly focused on 
the serialization of complete graph elements so using TinkerGraph as the basis 
for benchmark seemed like a good way to do that.  I subgraphed a portion of the 
Grateful Dead graph:
    
    ```text
    gremlin> subgraph = 
g.E().hasLabel('followedBy').subgraph('subGraph').cap('subGraph').next()
    ==>tinkergraph[vertices:338 edges:7047]
    ```
    
    I wanted a comparison with Gryo so I got that first:
    
    ```text
    gremlin> gryo = graph.io(IoCore.gryo()).mapper().create().createMapper()
    ==>org.apache.tinkerpop.shaded.kryo.Kryo@5edf2821
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   def output = new org.apache.tinkerpop.shaded.kryo.io.Output(s)
    ......3>   gryo.writeObject(output, subgraph) 
    ......4>   output.flush()
    ......5>   def input = new 
org.apache.tinkerpop.shaded.kryo.io.Input(s.toByteArray())
    ......6>   gryo.readObject(input, TinkerGraph.class) 
    ......7> }
    ==>55.96791679999999
    ```
    
    then compared head of tp32:
    
    ```text
    gremlin> graphson = 
GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper()
    ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@185f7840
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   graphson.writeValue(s, subgraph)
    ......3>   graphson.readValue(new 
java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class)
    ......4> }
    ==>723.1702541799999
    ```
    
    pretty bad - now GraphSON with changes from this branch:
    
    ```text
    gremlin> graphson = 
GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper()
    ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@563a89b5
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   graphson.writeValue(s, subgraph)
    ......3>   graphson.readValue(new 
java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class)
    ......4> }
    ==>67.0219281
    ```
    
    Not too far behind gryo now!
    
    Running full integration tests now and will update this PR when it's done 
successfully. 
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/tinkerpop TINKERPOP-1676

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tinkerpop/pull/613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #613
    
----
commit 46e6e9767b6818f33b56366ee15d80cec5a908e0
Author: Stephen Mallette <[email protected]>
Date:   2017-05-23T20:03:17Z

    TINKERPOP-1676 Got rid of stream() usage
    
    Can't believe we still had stream() in here. Will it ever all be gone from 
these performance sensitive places?! Anyway, removed that and deprecated a 
constructor on DetachedEdge that was using Pair for no really good reason. No 
need to create extra Pair objects for that. They just sorta get thrown away 
after usage.

commit 251f5b7e34e8ebd9a8bc36802e633be8c91eeb5e
Author: Stephen Mallette <[email protected]>
Date:   2017-05-23T20:57:12Z

    TINKERPOP-1676 Improved speed and memory usage of GraphSON
    
    This change is specific to TinkerGraph and the serialization of 
vertices/edges/properties. Removed a "middle layer" of JSON to Object 
serialization which was coercing JSON to Map first and then converting that to 
the graph elements. Also made deserializers cacheable.

commit 02b007366b434918fe1181f99d80689c1c03684b
Author: Stephen Mallette <[email protected]>
Date:   2017-05-24T00:13:46Z

    TINKERPOP-1676 Performance enhancement to graphson serialization
    
    Focuses on speeding up serialization of graph elements. Prevent use of 
generic maps and stream graphson data directly into "detached" elements. Not 
using intermediate maps cut down on memory usage and a bunch of jackson 
reflection calls (still not sure what they were for and why they were not made 
to be more efficient). It did mean some ugly changes to "detached" stuff. Will 
need to maybe refactor some more, but the basic premise seems to be proven.

commit e5d2d2bc000654fd026521219cf30bac265139a3
Author: Stephen Mallette <[email protected]>
Date:   2017-05-24T17:38:16Z

    TINKERPOP-1676 Removed properties from graphson serialization of Path
    
    Path should not have properties on elements (if they are present). That is 
inconsistent with gryo and was unintentially allowed.

commit 049c979720a84ea9744725fda5f8d2b371ab1751
Author: Stephen Mallette <[email protected]>
Date:   2017-05-25T17:29:45Z

    TINKERPOP-1676 More optimizations to GraphSON serializers
    
    Added some tests where there were previously gaps.

commit f32d725a080d39ef7bf93c68d8080939f622cb37
Author: Stephen Mallette <[email protected]>
Date:   2017-05-25T18:03:29Z

    TINKERPOP-1676 Cleaned up serialization with detached
    
    The DetachedUtil was just there to test out if the performance changes 
would help and it did it's job nicely, but it was kinda ugly and hung some 
methods out there in weird way. Cleaned that up with some builder pattern on 
the detached classes themselves.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to