damccorm commented on code in PR #34912: URL: https://github.com/apache/beam/pull/34912#discussion_r2084679998
########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -1184,6 +1184,45 @@ func init() { </span> +{{< paragraph class="language-python">}} +Proper Use of return vs yield in Python Functions. +In Python, functions can return results either all at once (return) or lazily one at a time (yield). The right choice depends on your use case, especially when dealing with large datasets or streaming scenarios. Review Comment: We don't need to include information on how Python generally handles return vs yield, we should specifically talk about how Beam handles them ########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -1184,6 +1184,45 @@ func init() { </span> +{{< paragraph class="language-python">}} +Proper Use of return vs yield in Python Functions. +In Python, functions can return results either all at once (return) or lazily one at a time (yield). The right choice depends on your use case, especially when dealing with large datasets or streaming scenarios. +{{< /paragraph >}} + + +{{< highlight python >}} +# Returning a single string instead of a sequence +def get_lines_wrong(): + return "Line 1" # Not iterable in the way most expect + +# Returning a list of strings +def get_lines_as_list(): + return ["Line 1", "Line 2", "Line 3"] # Eager, but valid + +# Yielding each line one at a time +def get_lines_generator(): + with open("data.txt") as f: + for line in f: + yield line.strip() # Lazy and memory-efficient + +{{< /highlight >}} + + +<span class="language-python"> + +> **Note:** +> +> - **Returning a single element (e.g., `return element`) is incorrect** +> The `process` method in Beam must return an *iterable* of elements. Returning a single value like an integer or string (e.g., `return element`) leads to a runtime error (`TypeError: 'int' object is not iterable`). Always ensure your return type is iterable. Review Comment: ```suggestion > The `process` method in Beam must return an *iterable* of elements. Returning a single value like an integer or string (e.g., `return element`) leads to a runtime error (`TypeError: 'int' object is not iterable`) or incorrect results since the return value will be treated as an iterable. Always ensure your return type is iterable. ``` ########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -1184,6 +1184,45 @@ func init() { </span> +{{< paragraph class="language-python">}} +Proper Use of return vs yield in Python Functions. +In Python, functions can return results either all at once (return) or lazily one at a time (yield). The right choice depends on your use case, especially when dealing with large datasets or streaming scenarios. +{{< /paragraph >}} + + +{{< highlight python >}} +# Returning a single string instead of a sequence +def get_lines_wrong(): + return "Line 1" # Not iterable in the way most expect + +# Returning a list of strings +def get_lines_as_list(): + return ["Line 1", "Line 2", "Line 3"] # Eager, but valid + +# Yielding each line one at a time +def get_lines_generator(): + with open("data.txt") as f: + for line in f: + yield line.strip() # Lazy and memory-efficient + +{{< /highlight >}} + + +<span class="language-python"> + +> **Note:** +> +> - **Returning a single element (e.g., `return element`) is incorrect** +> The `process` method in Beam must return an *iterable* of elements. Returning a single value like an integer or string (e.g., `return element`) leads to a runtime error (`TypeError: 'int' object is not iterable`). Always ensure your return type is iterable. +> +> - **Returning a list (e.g., `return [element1, element2]`) is valid but eager** +> This method is syntactically correct and works for small numbers of outputs. However, it builds the entire list in memory before returning it, which can increase memory consumption and impact performance for large data sets. +> +> - **Using `yield` (e.g., `yield element`) is preferred for scalability** +> Using `yield` turns the method into a generator function. This enables lazy evaluation, where each element is processed and emitted one at a time. It’s more memory-efficient and better suited for large pipelines or streaming workloads. Review Comment: We can get rid of the eager vs not eager piece. In the context of Beam it won't necessarily matter since the data may be materialized by Beam anyways ########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -1184,6 +1184,45 @@ func init() { </span> +{{< paragraph class="language-python">}} +Proper Use of return vs yield in Python Functions. +In Python, functions can return results either all at once (return) or lazily one at a time (yield). The right choice depends on your use case, especially when dealing with large datasets or streaming scenarios. +{{< /paragraph >}} + + +{{< highlight python >}} +# Returning a single string instead of a sequence +def get_lines_wrong(): + return "Line 1" # Not iterable in the way most expect + +# Returning a list of strings +def get_lines_as_list(): + return ["Line 1", "Line 2", "Line 3"] # Eager, but valid + +# Yielding each line one at a time +def get_lines_generator(): + with open("data.txt") as f: + for line in f: + yield line.strip() # Lazy and memory-efficient + +{{< /highlight >}} + + +<span class="language-python"> + +> **Note:** +> +> - **Returning a single element (e.g., `return element`) is incorrect** +> The `process` method in Beam must return an *iterable* of elements. Returning a single value like an integer or string (e.g., `return element`) leads to a runtime error (`TypeError: 'int' object is not iterable`). Always ensure your return type is iterable. Review Comment: Could we add examples for each of these? For example, something like: ``` # Incorrect return of element instead of iterable class IdentityFunction(beam.DoFn): def process(self, element): return element with beam.Pipeline() as pipeline: examples = ( pipeline | "CreateExamples" >> beam.Create(["foo"]) | "Map" >> beam.ParDo(ReturnIndividualElement()) | "Print" >> beam.ParDo(FormatOutput()) ) # prints: # f # o # o ``` and then similar examples with correct return types -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org