dariabezkorovaina commented on code in PR #30284: URL: https://github.com/apache/beam/pull/30284#discussion_r1487812410
########## learning/prompts/documentation-lookup/29_advanced_side_inputs.md: ########## @@ -0,0 +1,34 @@ +Prompt: +What are side inputs in Apache Beam? + +Response: +In Apache Beam, side inputs are additional inputs that can be provided to a [‘ParDo’ transform](https://beam.apache.org/documentation/programming-guide/#pardo), in addition to the main input [`PCollection`](https://beam.apache.org/documentation/basics/#pcollection). These side inputs can be used by the `ParDo` transform's `DoFn` to access and read additional data during the processing of each element in the main input `PCollection`. +Side inputs are particularly useful when a Beam pipeline needs to ingest additional data dynamically at runtime when processing each element, rather than relying on preset or hard-coded values. By leveraging side inputs, Apache Beam can determine additional data based on the main `PCollection` data or even from a different branch of your pipeline. For example, you can leverage side inputs for your streaming analytics use cases, enriching streaming events with additional data retrieved from a lookup table. + +In the Apache Beam Python SDK, you can provide side inputs as extra arguments in the `DoFn`'s `process` method or the `Map`/`FlatMap` transforms. The Python SDK supports optional, positional, and keyword arguments. + +```python +class MyDoFn(beam.DoFn): + def process(self, element, side_input): + ... +``` + +In the Java SDK, side inputs are provided to the `ParDo` transform by using the `.withSideInputs` method. To access a side input from within a `DoFn`, the `DoFn.ProcessContext.sideInput` method is invoked. + +```java +PCollection<Integer> input = ...; +PCollectionView<Integer> sideInput = ...; +PCollection<Integer> output = + input.apply(ParDo.of(new DoFn<Integer, Integer>() { + @ProcessElement + public void processElement(ProcessContext c) { + Integer sideInputValue = c.sideInput(sideInput); + ... + } + }).withSideInputs(sideInput)); +``` + +Side inputs can also be used to process windowed data. Apache Beam uses the windows for the main input elements to look up the appropriate windows for the side input elements. Apache Beam projects the main input’s window to the side input's window set and uses the side input from the resulting window. Main and side inputs might have identical or different windowing strategies. +For example, if the main input `PCollection` is windowed into ten-minute windows, and the side input is windowed into one-hour windows, Apache Beam projects the main input window set and selects the value from the corresponding one-hour side input window. For more information, refer to [side inputs and windowing](https://beam.apache.org/documentation/programming-guide/#side-inputs-windowing). + +For common patterns and code samples in Apache Beam pipelines to implement side inputs, refer to [side input patterns](https://beam.apache.org/documentation/patterns/side-inputs/). Review Comment: ```suggestion For common patterns and code samples to implement side inputs in Apache Beam pipelines, refer to [side input patterns](https://beam.apache.org/documentation/patterns/side-inputs/). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
