[ https://issues.apache.org/jira/browse/TINKERPOP-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025273#comment-16025273 ]
ASF GitHub Bot commented on TINKERPOP-1676: ------------------------------------------- GitHub user spmallette opened a pull request: https://github.com/apache/tinkerpop/pull/613 TINKERPOP-1676 (tp32) GraphSON Performance https://issues.apache.org/jira/browse/TINKERPOP-1676 Made some pretty good improvements to performance. I was mostly focused on the serialization of complete graph elements so using TinkerGraph as the basis for benchmark seemed like a good way to do that. I subgraphed a portion of the Grateful Dead graph: ```text gremlin> subgraph = g.E().hasLabel('followedBy').subgraph('subGraph').cap('subGraph').next() ==>tinkergraph[vertices:338 edges:7047] ``` I wanted a comparison with Gryo so I got that first: ```text gremlin> gryo = graph.io(IoCore.gryo()).mapper().create().createMapper() ==>org.apache.tinkerpop.shaded.kryo.Kryo@5edf2821 gremlin> clock { ......1> def s = new java.io.ByteArrayOutputStream() ......2> def output = new org.apache.tinkerpop.shaded.kryo.io.Output(s) ......3> gryo.writeObject(output, subgraph) ......4> output.flush() ......5> def input = new org.apache.tinkerpop.shaded.kryo.io.Input(s.toByteArray()) ......6> gryo.readObject(input, TinkerGraph.class) ......7> } ==>55.96791679999999 ``` then compared head of tp32: ```text gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper() ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@185f7840 gremlin> clock { ......1> def s = new java.io.ByteArrayOutputStream() ......2> graphson.writeValue(s, subgraph) ......3> graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class) ......4> } ==>723.1702541799999 ``` pretty bad - now GraphSON with changes from this branch: ```text gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper() ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@563a89b5 gremlin> clock { ......1> def s = new java.io.ByteArrayOutputStream() ......2> graphson.writeValue(s, subgraph) ......3> graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class) ......4> } ==>67.0219281 ``` Not too far behind gryo now! Running full integration tests now and will update this PR when it's done successfully. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1676 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/613.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #613 ---- commit 46e6e9767b6818f33b56366ee15d80cec5a908e0 Author: Stephen Mallette <sp...@genoprime.com> Date: 2017-05-23T20:03:17Z TINKERPOP-1676 Got rid of stream() usage Can't believe we still had stream() in here. Will it ever all be gone from these performance sensitive places?! Anyway, removed that and deprecated a constructor on DetachedEdge that was using Pair for no really good reason. No need to create extra Pair objects for that. They just sorta get thrown away after usage. commit 251f5b7e34e8ebd9a8bc36802e633be8c91eeb5e Author: Stephen Mallette <sp...@genoprime.com> Date: 2017-05-23T20:57:12Z TINKERPOP-1676 Improved speed and memory usage of GraphSON This change is specific to TinkerGraph and the serialization of vertices/edges/properties. Removed a "middle layer" of JSON to Object serialization which was coercing JSON to Map first and then converting that to the graph elements. Also made deserializers cacheable. commit 02b007366b434918fe1181f99d80689c1c03684b Author: Stephen Mallette <sp...@genoprime.com> Date: 2017-05-24T00:13:46Z TINKERPOP-1676 Performance enhancement to graphson serialization Focuses on speeding up serialization of graph elements. Prevent use of generic maps and stream graphson data directly into "detached" elements. Not using intermediate maps cut down on memory usage and a bunch of jackson reflection calls (still not sure what they were for and why they were not made to be more efficient). It did mean some ugly changes to "detached" stuff. Will need to maybe refactor some more, but the basic premise seems to be proven. commit e5d2d2bc000654fd026521219cf30bac265139a3 Author: Stephen Mallette <sp...@genoprime.com> Date: 2017-05-24T17:38:16Z TINKERPOP-1676 Removed properties from graphson serialization of Path Path should not have properties on elements (if they are present). That is inconsistent with gryo and was unintentially allowed. commit 049c979720a84ea9744725fda5f8d2b371ab1751 Author: Stephen Mallette <sp...@genoprime.com> Date: 2017-05-25T17:29:45Z TINKERPOP-1676 More optimizations to GraphSON serializers Added some tests where there were previously gaps. commit f32d725a080d39ef7bf93c68d8080939f622cb37 Author: Stephen Mallette <sp...@genoprime.com> Date: 2017-05-25T18:03:29Z TINKERPOP-1676 Cleaned up serialization with detached The DetachedUtil was just there to test out if the performance changes would help and it did it's job nicely, but it was kinda ugly and hung some methods out there in weird way. Cleaned that up with some builder pattern on the detached classes themselves. ---- > Improve GraphSON 2.0 Performance > -------------------------------- > > Key: TINKERPOP-1676 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1676 > Project: TinkerPop > Issue Type: Improvement > Components: io > Affects Versions: 3.2.4 > Reporter: stephen mallette > Assignee: stephen mallette > Labels: breaking > > Not sure what will happen here. Just wanted an issue where I could experiment > a bit with and gather feedback from others for collaboration in case others > had ideas. Right now, GraphSON reads are considerably slower than Gryo and > writes appear to be faster, so the focus will be on how to get reads more in > line with what Gryo can do. -- This message was sent by Atlassian JIRA (v6.3.15#6346)