[ https://issues.apache.org/jira/browse/STORM-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493736#comment-16493736 ]
Stig Rohde Døssing commented on STORM-3088: ------------------------------------------- If I'm understanding you correctly, you want a public static method somewhere to fetch Storm's Kryo instance/a copy of Storm's Kryo instance? I think you can implement this using worker hooks [https://storm.apache.org/releases/1.2.1/javadocs/org/apache/storm/hooks/BaseWorkerHook.html#start-java.util.Map-org.apache.storm.task.WorkerTopologyContext-.] It will get called before any components start in the worker, and it's being passed the topology config in the start method. Then you can make a public static field/getter in the hook that your library can call. You can add a worker hook via the TopologyBuilder.addWorkerHook method. > Request to get storm's Kryo configuration for other use > ------------------------------------------------------- > > Key: STORM-3088 > URL: https://issues.apache.org/jira/browse/STORM-3088 > Project: Apache Storm > Issue Type: Wish > Components: storm-core > Reporter: David Willcox > Priority: Minor > > In short, I'd like a way to get a Kryo serializer for "private" use that has > the same configuration used by storm for serializing inter-worker tuples. > Ideally, that would be a version of SerializationFactory.getKryo() that > returned a Kryo object with the same configuration of storm's, but that > doesn't exist. > Obviously, we can pass the topology configuration Map to getKryo(), but > there's no way for a library, whose internal workings should be opaque to > storm components, to get access to that Map. > We've worked around this by adding an initialization call in our components' > open/prepare methods, but that's just icky. It would be much cleaner if the > library could handle this on its own without bothering component developers > that shouldn't have to deal with this. > I can think of several solutions, any of which would be acceptable. Some > would probably be useful for cases other than ours. > * As mentioned above, a variant of SerializationUtils.getKryo() that > returned a Kryo object with the same configuration as storm's. > * An API that could be called anywhere that returned the Map passed to > components' open/prepare methods. > * A mechanism to allow registering an initialization function to be called > on worker startup. It would be passed the above-mentioned Map, and all > initializers would be called before the worker started any components. (I > kind of like this one best. Seems most flexible.) > Background: > We have a custom Kryo serializer for our events that implements lazy > deserialization. Most tuples have just a single event object. On > serialization, some fields of the event are serialized using Kryo, others > with a a more primitive method. But on deserialization on receipt in a > worker, no fields are actually deserialized; fields are only deserialized > when referenced by the receiving bolt. On re-serialization for output, only > fields modified within the worker are serialized. Since a large majority of > fields in our events never change as they flow through multiple bolts, this > saves considerable CPU in serialization/deserialization. > The issue is: When the event is serialized by storm, we use storm's Kryo > serializer. But deserialization of a field may happen when a bolt references > a serialized field, and at that point we don't have storm's Kryo, only one we > created ourselves. Ensuring the two Kryos are configured the same requires > access to the storm configuration Map. > Like I say, we've hacked around this issue, but would prefer a cleaner > solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005)