[ 
https://issues.apache.org/jira/browse/STORM-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033374#comment-16033374
 ] 

Stig Rohde Døssing commented on STORM-2540:
-------------------------------------------

I've changed my mind. The compaction mechanism is likely bad in most cases, but 
I noticed that the Trident windowing API allows the user to combine any type of 
TriggerPolicy with any type of EvictionPolicy. I was going by the assumption 
that policies were always "paired" to be the same type.

I don't want to work on this, because I don't have enough experience with the 
windowing API to know if mixing policies makes sense, and in which cases 
compaction may be necessary.

> Get rid of window compaction in WindowManager
> ---------------------------------------------
>
>                 Key: STORM-2540
>                 URL: https://issues.apache.org/jira/browse/STORM-2540
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-client
>    Affects Versions: 2.0.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>
> Storm's windowing support uses trigger and eviction policies to control the 
> size of the windows passed to WindowingBolts. The WindowManager has a hard 
> coded limit of 100 tuples before tuples will start getting evicted from the 
> window, probably as an attempt to avoid overly huge windows when using time 
> based eviction policies. Whenever a tuple is added to the window, the hard 
> cap is checked, and if the number of tuples in the window exceeds the cap the 
> WindowManager evaluates the EvictionPolicy for the tuples to figure out if 
> some can be removed.
> This hard cap is ineffective in most configurations, and has a surprising 
> interaction with the count based policy.
> If the windowing bolt is configured to use timestamp fields in the tuples to 
> determine the current time, the WatermarkingXPolicy classes are used. In this 
> configuration, the compaction isn't doing anything because tuples cannot be 
> evicted until the WatermarkGenerator sends a new watermark, and when it does 
> the TriggerPolicy causes the WindowManager to evict any expired tuples anyway.
> If the windowing bolt is using the count based policy, compaction has the 
> unexpected effect of hard capping the user's configured max count to 100. If 
> the configured count is less than 100, the compaction again has no effect.
> When the bolt is configured to use the tuple arrival time based policy, the 
> compaction only has an effect if there are tuples older than the configured 
> window duration, which only happens if the window happens to trigger slightly 
> late. This can cause tuples to be evicted from the window before the user's 
> bolt sees them. Even when tuples are evicted with the compaction mechanism 
> they are kept in memory until the next time a window is presented to the 
> user's bolt.
> I think the compaction mechanism should be removed. The only policy that 
> benefits is the time based policy, and in that case it would be better to 
> just add a configurable max tuple count to that policy. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to