damccorm commented on code in PR #34912: URL: https://github.com/apache/beam/pull/34912#discussion_r2330244749
########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -1183,6 +1183,100 @@ func init() { > parameters to a single `emitter function`. </span> +{{< paragraph class="language-python">}} Review Comment: ```suggestion {{< paragraph class="language-python">}} ``` Nit ########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -1183,6 +1183,100 @@ func init() { > parameters to a single `emitter function`. </span> +{{< paragraph class="language-python">}} +Proper use of return vs yield in Python Functions. +{{< /paragraph >}} + +<span class="language-python"> + +> **Returning a single element (e.g., `return element`) is incorrect** +> The `process` method in Beam must return an *iterable* of elements. Returning a single value like an integer or string +> (e.g., `return element`) leads to a runtime error (`TypeError: 'int' object is not iterable`) or incorrect results since the return value +> will be treated as an iterable. Always ensure your return type is iterable. + +</span> + +{{< highlight python >}} +# Incorrectly Returning a single string instead of a sequence +class ReturnIndividualElement(beam.DoFn): + def process(self, element): + return element + +with beam.Pipeline() as pipeline: + ( + pipeline + | "CreateExamples" >> beam.Create(["foo"]) + | "MapIncorrect" >> beam.ParDo(ReturnIndividualElement()) + | "Print" >> beam.Map(print) + ) + # prints: + # f + # o + # o +{{< /highlight >}} + +<span class="language-python"> + +> **Returning a list (e.g., `return [element1, element2]`) is valid because List is Iterable** +> This approach works well when emitting multiple outputs from a single call and is easy to read for small datasets. + +</span> + +{{< highlight python >}} +# Returning a list of strings +class ReturnWordsFn(beam.DoFn): + def process(self, element): + # Split the sentence and return all words longer than 2 characters as a list + return [word for word in element.split() if len(word) > 2] + +with beam.Pipeline() as pipeline: + ( + pipeline + | "CreateSentences_Return" >> beam.Create([ # Create a collection of sentences + "Apache Beam is powerful", # Sentence 1 + "Try it now" # Sentence 2 + ]) + | "SplitWithReturn" >> beam.ParDo(ReturnWordsFn()) # Apply the custom DoFn to split words + | "PrintWords_Return" >> beam.Map(print) # Print each List of words + ) + # prints: + # ['Apache', 'Beam', 'powerful'] + # ['Try', 'now'] Review Comment: ```suggestion # Apache # Beam # powerful # Try # now ``` When you return an iterable, each element in the iterable is part of the pcollection -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org