Stig Rohde Døssing created STORM-2540:
-----------------------------------------
Summary: Get rid of window compaction in WindowManager
Key: STORM-2540
URL: https://issues.apache.org/jira/browse/STORM-2540
Project: Apache Storm
Issue Type: Improvement
Components: storm-client
Affects Versions: 2.0.0
Reporter: Stig Rohde Døssing
Assignee: Stig Rohde Døssing
Storm's windowing support uses trigger and eviction policies to control the
size of the windows passed to WindowingBolts. The WindowManager has a hard
coded limit of 100 tuples before tuples will start getting evicted from the
window, probably as an attempt to avoid overly huge windows when using time
based eviction policies. Whenever a tuple is added to the window, the hard cap
is checked, and if the number of tuples in the window exceeds the cap the
WindowManager evaluates the EvictionPolicy for the tuples to figure out if some
can be removed.
This hard cap is ineffective in most configurations, and has a surprising
interaction with the count based policy.
If the windowing bolt is configured to use timestamp fields in the tuples to
determine the current time, the WatermarkingXPolicy classes are used. In this
configuration, the compaction isn't doing anything because tuples cannot be
evicted until the WatermarkGenerator sends a new watermark, and when it does
the TriggerPolicy causes the WindowManager to evict any expired tuples anyway.
If the windowing bolt is using the count based policy, compaction has the
unexpected effect of hard capping the user's configured max count to 100. If
the configured count is less than 100, the compaction again has no effect.
When the bolt is configured to use the tuple arrival time based policy, the
compaction only has an effect if there are tuples older than the configured
window duration, which only happens if the window happens to trigger slightly
late. This can cause tuples to be evicted from the window before the user's
bolt sees them. Even when tuples are evicted with the compaction mechanism they
are kept in memory until the next time a window is presented to the user's bolt.
I think the compaction mechanism should be removed. The only policy that
benefits is the time based policy, and in that case it would be better to just
add a configurable max tuple count to that policy.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)