Stig Rohde Døssing created STORM-2540:
-----------------------------------------

             Summary: Get rid of window compaction in WindowManager
                 Key: STORM-2540
                 URL: https://issues.apache.org/jira/browse/STORM-2540
             Project: Apache Storm
          Issue Type: Improvement
          Components: storm-client
    Affects Versions: 2.0.0
            Reporter: Stig Rohde Døssing
            Assignee: Stig Rohde Døssing


Storm's windowing support uses trigger and eviction policies to control the 
size of the windows passed to WindowingBolts. The WindowManager has a hard 
coded limit of 100 tuples before tuples will start getting evicted from the 
window, probably as an attempt to avoid overly huge windows when using time 
based eviction policies. Whenever a tuple is added to the window, the hard cap 
is checked, and if the number of tuples in the window exceeds the cap the 
WindowManager evaluates the EvictionPolicy for the tuples to figure out if some 
can be removed.

This hard cap is ineffective in most configurations, and has a surprising 
interaction with the count based policy.

If the windowing bolt is configured to use timestamp fields in the tuples to 
determine the current time, the WatermarkingXPolicy classes are used. In this 
configuration, the compaction isn't doing anything because tuples cannot be 
evicted until the WatermarkGenerator sends a new watermark, and when it does 
the TriggerPolicy causes the WindowManager to evict any expired tuples anyway.

If the windowing bolt is using the count based policy, compaction has the 
unexpected effect of hard capping the user's configured max count to 100. If 
the configured count is less than 100, the compaction again has no effect.

When the bolt is configured to use the tuple arrival time based policy, the 
compaction only has an effect if there are tuples older than the configured 
window duration, which only happens if the window happens to trigger slightly 
late. This can cause tuples to be evicted from the window before the user's 
bolt sees them. Even when tuples are evicted with the compaction mechanism they 
are kept in memory until the next time a window is presented to the user's bolt.

I think the compaction mechanism should be removed. The only policy that 
benefits is the time based policy, and in that case it would be better to just 
add a configurable max tuple count to that policy. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to