Github user dalaro commented on the issue: https://github.com/apache/incubator-tinkerpop/pull/325 These questions involved a lot of back-and-forth that would have been slow to resolve on github comments, so and I discussed the failures and the code on a hangout earlier today. Summary: * Marko created https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP-1321 on the TP repo. This branch is my PR plus additional tweaks Marko and I worked out on top. His branch supersedes this PR branch now. This PR branch **should not be merged because Marko's branch subsumes/extends it**. * I have a rough idea of what changed as we discussed code on the hangout, but I will go through his branch's new commits and review. * The new serialization code is lower-priority than the old code. It has to be turned on through explicit user action, like this is the test provider code: ``` System.setProperty(SHIM_CLASS_SYSTEM_PROPERTY, UnshadedKryoShimService.class.getCanonicalName()); config.put("spark.serializer", KryoSerializer.class.getCanonicalName()); config.put("spark.kryo.registrator", GryoRegistrator.class.getCanonicalName()); ``` * Marko setup SparkHadoopGraphProvider to randomly choose either the old serialization code or the new stuff in https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-afffc6796745845d46e6f60ea3a992f9R91. IMHO that should be overrideable to make it deterministic, but it's not a huge deal since it's limited to test code. * We replaced `GryoClassResolver` with a series of class registrations in https://github.com/apache/incubator-tinkerpop/commit/e7003635e27c625b3f30492111f20f4fe4e24eb5#diff-2e881d0a6ab7ed1ca7d1f593bceadfd2R207. The idea here is to explicitly configure which types need a detaching serializer rather than relying on `GryoClassResolver` to automagically do it regardless of explicit class registrations. Three factors influence this choice: * Spark's KryoSerializer does not support a user-provided ClassResolver, and working around that limitation entails either getting a change through Spark upstream or copying and pasting a bunch of KryoSerializer's private/hardcoded class registrations, some of which could change in future Spark versions. The latter is ugly and would make the code brittle across Spark upgrades. * `GryoClassResolver` is convenient but somewhat mysterious to third parties, who are likelier to have seen a custom class registration mechanism (like TP's `IoRegistry` or Spark's `spark.kryo.registrator`) than a custom resolver. It also removes vendor control in that its detachment behavior is not configurable, so modifying it requires subclassing or reimplementation. For instance, if a vendor had a lightweight Vertex implementation with a custom serializer and wanted to bypass TP's resolver-driven detachment, then I don't think it would be possible without modifying TP's resolver. * `GryoClassResolver` is written against a bunch of shaded types and relies on `readVarInt`/`writeVarInt`, which were added in Kryo 2.22, one version too late for compatibility with Spark's Kryo 2.21. This is the least important concern of the three, more an implementation detail. * Marko is running the full integration test suite now. He ran the spark ITs just before pushing and reported success on those specifically.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---