[GitHub] flink pull request #3191: [FLINK-5529] [FLINK-4752] [docs] Improve / extends...

aljoscha Wed, 25 Jan 2017 09:32:29 -0800

Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3191#discussion_r97833288
  
    --- Diff: docs/dev/windows.md ---
    @@ -758,8 +831,33 @@ input
     val input: DataStream[T] = ...
     
     input
    -    .windowAll(<window assigner>)
    +    .keyBy(<key selector>)
    +    .window(<window assigner>)
    +    .allowedLateness(<time>)
         .<windowed transformation>(<window function>)
     {% endhighlight %}
     </div>
     </div>
    +
    +<span class="label label-info">Note</span> When using the `GlobalWindows` 
window assigner no
    +data is ever considered late because the end timestamp of the global 
window is `Long.MAX_VALUE`.
    +
    +### Late elements considerations
    +
    +When specifying an allowed lateness greater than 0, the window along with 
its content is kept after the watermark passes
    +the end of the window. In these cases, when a late but not dropped element 
arrives, it will trigger another firing for the 
    +window. These firings are called `late firings`, as they are triggered by 
late events and in contrast to the `main firing` 
    +which is the first firing of the window. In case of session windows, late 
firings can further lead to merging of windows,
    +as they may "bridge" the gap between two pre-existing, unmerged windows.
    +
    +<span class="label label-info">Attention</span> You should be aware that 
the elements emitted by a late firing should be treated as updated results of a 
previous computation, i.e., your data stream will contain multiple results for 
the same computation. Depending on your application, you need to take these 
duplicated results into account or deduplicate them.
    +
    +## Useful state size considerations
    +
    +Windows can be defined over long periods of time (such as days, weeks, or 
months) and therefore accumulate very large state. There are a couple of rules 
to keep in mind when estimating the storage requirements of your windowing 
computation:
    + 
    +1. Flink creates one copy of each element per window to which it belongs. 
Given this, tumbling windows keep one copy of each element (an element belongs 
to exactly window unless it is dropped late). In contrast, sliding windows 
create several of each element, as explained in the [Window 
Assigners](#window-assigners) section. Hence, a sliding window of size 1 day 
and slide 1 second might not be a good idea.
    +
    +2. `FoldFunction` and `ReduceFunction` can significantly reduce the 
storage requirements, as they eagerly aggregate elements and store only one 
value per window. In contrast a `WindowFunction` must accumulate all elements.
    --- End diff --
    
    In contrast, just using a `WindowFunction` requires accumulating all 
elements.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3191: [FLINK-5529] [FLINK-4752] [docs] Improve / extends...

Reply via email to