tvalentyn commented on code in PR #35647: URL: https://github.com/apache/beam/pull/35647#discussion_r2225761190
########## website/www/site/content/en/documentation/transforms/python/other/waiton.md: ########## @@ -0,0 +1,65 @@ +--- +title: "WaitOn" +--- + +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +# WaitOn + +`WaitOn` delays the processing of a main `PCollection` until one or more other `PCollections` (signals) have finished processing. This is useful for enforcing ordering or dependencies between different parts of a pipeline, especially when some outputs interact with external systems (such as writing to a database). + +When you apply `WaitOn`, the elements of the main `PCollection` will not be processed until all the specified signal `PCollections` have completed. In streaming mode, this is enforced per window: the corresponding window of each waited-on `PCollection` must be complete before elements are passed through. + +## Examples + +```python +import apache_beam as beam +from apache_beam.transforms.util import WaitOn + +# Example 1: Basic usage +with beam.Pipeline() as p: + main = p | 'CreateMain' >> beam.Create([1, 2, 3]) + side = p | 'CreateSide' >> beam.Create(['a', 'b', 'c']) Review Comment: How about we change 'side' to 'signal' and add some processsing step after create, so that there is a processing step involved? Even if something simple as to sleep for a few seconds. ########## website/www/site/content/en/documentation/transforms/python/other/waiton.md: ########## @@ -0,0 +1,65 @@ +--- +title: "WaitOn" +--- + +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +# WaitOn + +`WaitOn` delays the processing of a main `PCollection` until one or more other `PCollections` (signals) have finished processing. This is useful for enforcing ordering or dependencies between different parts of a pipeline, especially when some outputs interact with external systems (such as writing to a database). Review Comment: How about the following wording in both Python and Java docs: `WaitOn` returns a `PCollection` with the contents identical to the input `PCollection`, but delays the downstream processing until one or more other `PCollections` (signals) have finished processing. This is useful for enforcing ordering or dependencies between different parts of a pipeline, especially when some outputs interact with external systems (such as writing to a database).` When you apply `WaitOn`, the elements of the main `PCollection` will not be emitted for downstream processing until the computations required to produce the specified signal `PCollections` have completed. In streaming mode, this is enforced per window: the corresponding window of each waited-on `PCollection` must close before elements are passed through. ########## website/www/site/content/en/documentation/transforms/java/other/wait.md: ########## @@ -0,0 +1,56 @@ +--- +title: "Wait.On" +--- + +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +# Wait.On + +`Wait.On` is a transform that delays the processing of a main `PCollection` until one or more other `PCollections` (signals) have finished processing. This is useful for enforcing ordering or dependencies between different parts of a pipeline, especially when some outputs interact with external systems (such as writing to a database). + +When you apply `Wait.On`, the elements of the main `PCollection` will not be processed until all the specified signal `PCollections` have completed. In streaming mode, this is enforced per window: the corresponding window of each waited-on `PCollection` must be complete before elements are passed through. + +## Examples + +```java +// Example 1: Basic usage +PCollection<String> main = ...; +PCollection<Void> signal = ...; + +// Wait for 'signal' to complete before processing 'main' +// Elements pass through unchanged after 'signal' finishes +PCollection<String> result = main.apply(Wait.on(signal)); Review Comment: 1. Could you please add a link to the javadoc: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Wait.html 2. There is no downstream processing after Wait.on - could we add something? You could also reuse the example from the Javadoc if you don't feel like making a full example. ########## website/www/site/content/en/documentation/transforms/python/overview.md: ########## @@ -82,4 +82,6 @@ limitations under the License. most useful for adjusting parallelism or preventing coupled failures.</td></tr> <tr><td><a href="/documentation/transforms/python/other/windowinto">WindowInto</a></td><td>Logically divides up or groups the elements of a collection into finite windows according to a function.</td></tr> + <tr><td><a href="/documentation/transforms/python/other/waiton">WaitOn</a></td><td>Delays processing of a PCollection until other PCollections have finished processing.</td></tr> Review Comment: Could you please add Wait/WaitOn to the sidebard on the left hand side of the page? Thanks! See (e.g.) Transform Catalog > Java > Other. Currently rendered version (available at https://apache-beam-website-pull-requests.storage.googleapis.com/35647/documentation/transforms/java/other/wait/index.html ) doesn't have it. ########## website/www/site/content/en/documentation/transforms/python/other/waiton.md: ########## @@ -0,0 +1,65 @@ +--- +title: "WaitOn" +--- + +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +# WaitOn + +`WaitOn` delays the processing of a main `PCollection` until one or more other `PCollections` (signals) have finished processing. This is useful for enforcing ordering or dependencies between different parts of a pipeline, especially when some outputs interact with external systems (such as writing to a database). + +When you apply `WaitOn`, the elements of the main `PCollection` will not be processed until all the specified signal `PCollections` have completed. In streaming mode, this is enforced per window: the corresponding window of each waited-on `PCollection` must be complete before elements are passed through. + +## Examples + +```python +import apache_beam as beam +from apache_beam.transforms.util import WaitOn + +# Example 1: Basic usage +with beam.Pipeline() as p: + main = p | 'CreateMain' >> beam.Create([1, 2, 3]) + side = p | 'CreateSide' >> beam.Create(['a', 'b', 'c']) + + # Wait for 'side' to complete before processing 'main' + # Elements [1, 2, 3] pass through unchanged after 'side' finishes + result = main | 'WaitOnSide' >> WaitOn(side) + result | beam.Map(print) + +# Example 2: Using multiple signals +with beam.Pipeline() as p: + main = p | 'CreateMain' >> beam.Create([1, 2, 3]) + side1 = p | 'CreateSide1' >> beam.Create(['a', 'b', 'c']) + side2 = p | 'CreateSide2' >> beam.Create(['x', 'y', 'z']) + + # Wait for both side1 and side2 to complete before processing main + result = main | 'WaitOnSides' >> WaitOn(side1, side2) + result | beam.Map(print) + +# Example 3: Streaming mode with windowing +with beam.Pipeline() as p: + main = (p | 'CreateMain' >> beam.Create([1, 2, 3]) + | "ApplyWindow" >> beam.WindowInto(FixedWindows(5 * 60))) + side = (p | 'CreateSide' >> beam.Create(['a', 'b', 'c']) Review Comment: There are not timestamps associated with elements after Create, and there will be an error if there is two transforms labelled 'ApplyWindow'. I would remove example 3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org