[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

dalaro Mon, 06 Jun 2016 01:28:23 -0700

Github user dalaro commented on the issue:

    https://github.com/apache/incubator-tinkerpop/pull/325
  
    I just pushed some changes that I hacked together this weekend.  The key 
additions are:
    
    * `TinkerPopKryoRegistrator`, which I extracted from my app, and which acts 
as a `spark.kryo.registrator` impl that knows about TinkerPop types
    * `IoRegistryAwareKryoSerializer`, which is a Spark `Serializer` that looks 
for `GryoPool.CONFIG_IO_REGISTRY` and applies it if present
    * `KryoShimLoaderService.applyConfiguration(cfg)`, which replaces direct 
calls to `HadoopPools.initialize(cfg)` and adds equivalent functionality for 
initializing the unshaded Kryo serializer pool
    
    The user would theoretically just set
    
    ```
    
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.IoRegistryAwareKryoSerializer
    
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.TinkerPopKryoRegistrator
    # Optional, only needed for custom types
    gremlin.io.registry=whatever.user.IoRegistryImpl
    ```
    
    In practice, when I have a custom gremlin.io.registry, I have always had to 
take the additional step (long before this PR) of forcibly initializing 
`HadoopPools` before touching SparkGraphComputer in my app, or else some part 
of Spark -- I think the closure serializer -- would attempt to use HadooPools 
via ObjectWritable/VertexWritable before initialization and produce garbage on 
my custom classes.  **This problem predates my PR**. I'm not trying to solve it 
here, in part because I still don't know if it's a pathology specific to my app 
or because TinkerPop is missing a crucial `HadoopPools.initialize` (now, 
equivalently, `KryoShimLoaderService.applyConfiguration`) call somewhere, and 
in part because HadoopPools is such a hideous architectural wart that the 
ultimate solution probably involves destroying it.
    
    In the past, I've worked around this by defining a custom spark.serializer 
that delegates newKryo() to a GryoSerializer/IoRegistryAwareSerializer, but 
which has a constructor that invokes 
`HadoopPools.initialize`/`KryoShimLoaderService.applyConfiguration` (relying on 
that method's idempotence).
    
    Again, this initialization step just be specific to my app and unnecessary 
for the average TinkerPop user.  It's possible that the config I pasted above 
will work for others.
    
    FWIW, this passes, so the overrides bug should be fixed along with all this 
refactoring stuff:
    
    ```
    mvn clean install -DskipTests=true && mvn verify -pl gremlin-server 
-DskipIntegrationTests=false -Dtest.single=GremlinResultSetIntegrateTest
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop issue #325: TINKERPOP-1321 Introduce Kryo shim to suppor...

Reply via email to