olehborysevych commented on code in PR #29507: URL: https://github.com/apache/beam/pull/29507#discussion_r1411113948
########## learning/prompts/documentation-lookup/08_basic_windowing.md: ########## @@ -0,0 +1,28 @@ +Prompt: +What is Windowing in Apache Beam? +Response: +Windowing is a key concept in stream processing, as it allows you to divide streams of data into logical units for efficient and correct parallel processing. +With an unbounded data set, it is impossible to collect all of the elements, since new elements are constantly being added. In the Beam model, any PCollection (including unbounded PCollections) can be subdivided into [logical windows](https://beam.apache.org/documentation/programming-guide/#windowing-basics). Grouping transforms then consider each PCollection’s elements on a per-window basis. + +Since Beam's default windowing strategy is to assign each element to a single, global window, you must explicitly specify a [windowing function](https://beam.apache.org/documentation/programming-guide/#setting-your-pcollections-windowing-function) for your pipeline. + +The following code snippet shows how to divide a PCollection into 60-second windows: +```python +from apache_beam import beam +from apache_beam import window +fixed_windowed_items = ( + items | 'window' >> beam.WindowInto(window.FixedWindows(60))) +``` + +Beam provides a number of [built-in windowing functions](https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions) that you can use to subdivide your data into windows: +- Fixed Time Windows +- Sliding Time Windows +- Per-Session Windows +- Single Global Window +- Calendar-based Windows (not supported by the Beam SDK for Python or Go) + +You can also create your own custom windowing function [WindowFn](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/window.py). + +You also need to specify a [triggering strategy](https://beam.apache.org/documentation/programming-guide/#triggers) to determine when to emit the results of your pipeline’s windowed computations. + +You can adjust the windowing strategy to allow for [late data](https://beam.apache.org/documentation/programming-guide/#watermarks-and-late-data), or data that arrives after the watermark has passed the end of the window. You can also specify how to handle late data, such as discarding it or adding it to the next window. Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
