[ 
https://issues.apache.org/jira/browse/STORM-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-7:
-----------------------------
    Component/s: storm-core

> storm.trident.operation.Aggregator: include group information in init() method
> ------------------------------------------------------------------------------
>
>                 Key: STORM-7
>                 URL: https://issues.apache.org/jira/browse/STORM-7
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: James Xu
>            Priority: Minor
>
> Reported by @lorenzfischer
> To be able to share resources between different groups in a grouped 
> aggregator, it would be helpful to have information about the group available 
> in the init() method of the aggregator interface.
> The concrete use case is the following:
> For our project we need to count the number of unique values in a field of a 
> grouped stream. We have hundreds of millions of unique values and millions of 
> grouped values. For this reason, we're currently deploying the HyperLogLog 
> class that has generously been made available by the people at Clearspring 
> >(https://github.com/clearspring/stream-lib). Naturally, we end up with 
> millions of counter objects.
> The DSI-Utils library (http://dsiutils.di.unimi.it) offers a class that 
> allows one to reduce the overhead incurred by this many HLL objects through 
> its HyperLogLogCounterArray class. We're struggling with the implementation 
> in Trident though, as the init(Object batchId, TridentCollector collector) 
> method of the aggregator interface does not provide any information about the 
> current "group" the aggregator should be initialized for.
> (This was initially posted on Google Groups: 
> https://groups.google.com/forum/#!topic/storm-user/dthUfkMRNhU)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to