[ 
https://issues.apache.org/jira/browse/STORM-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493736#comment-16493736
 ] 

Stig Rohde Døssing commented on STORM-3088:
-------------------------------------------

If I'm understanding you correctly, you want a public static method somewhere 
to fetch Storm's Kryo instance/a copy of Storm's Kryo instance?

I think you can implement this using worker hooks 
[https://storm.apache.org/releases/1.2.1/javadocs/org/apache/storm/hooks/BaseWorkerHook.html#start-java.util.Map-org.apache.storm.task.WorkerTopologyContext-.]
 It will get called before any components start in the worker, and it's being 
passed the topology config in the start method. Then you can make a public 
static field/getter in the hook that your library can call. You can add a 
worker hook via the TopologyBuilder.addWorkerHook method.

> Request to get storm's Kryo configuration for other use
> -------------------------------------------------------
>
>                 Key: STORM-3088
>                 URL: https://issues.apache.org/jira/browse/STORM-3088
>             Project: Apache Storm
>          Issue Type: Wish
>          Components: storm-core
>            Reporter: David Willcox
>            Priority: Minor
>
> In short, I'd like a way to get a Kryo serializer for "private" use that has 
> the same configuration used by storm for serializing inter-worker tuples. 
> Ideally, that would be a version of SerializationFactory.getKryo() that 
> returned a Kryo object with the same configuration of storm's, but that 
> doesn't exist.
> Obviously, we can pass the topology configuration Map to getKryo(), but 
> there's no way for a library, whose internal workings should be opaque to 
> storm components, to get access to that Map.
> We've worked around this by adding an initialization call in our components' 
> open/prepare methods, but that's just icky. It would be much cleaner if the 
> library could handle this on its own without bothering component developers 
> that shouldn't have to deal with this.
> I can think of several solutions, any of which would be acceptable. Some 
> would probably be useful for cases other than ours.
>  * As mentioned above, a variant of SerializationUtils.getKryo() that 
> returned a Kryo object with the same configuration as storm's.
>  * An API that could be called anywhere that returned the Map passed to 
> components' open/prepare methods.
>  * A mechanism to allow registering an initialization function to be called 
> on worker startup. It would be passed the above-mentioned Map, and all 
> initializers would be called before the worker started any components. (I 
> kind of like this one best. Seems most flexible.)
> Background:
> We have a custom Kryo serializer for our events that implements lazy 
> deserialization. Most tuples have just a single event object. On 
> serialization, some fields of the event are serialized using Kryo, others 
> with a a more primitive method. But on deserialization on receipt in a 
> worker, no fields are actually deserialized; fields are only deserialized 
> when referenced by the receiving bolt. On re-serialization for output, only 
> fields modified within the worker are serialized. Since a large majority of 
> fields in our events never change as they flow through multiple bolts, this 
> saves considerable CPU in serialization/deserialization.
> The issue is: When the event is serialized by storm, we use storm's Kryo 
> serializer. But deserialization of a field may happen when a bolt references 
> a serialized field, and at that point we don't have storm's Kryo, only one we 
> created ourselves. Ensuring the two Kryos are configured the same requires 
> access to the storm configuration Map.
> Like I say, we've hacked around this issue, but would prefer a cleaner 
> solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to