mtauha commented on PR #37672:
URL: https://github.com/apache/beam/pull/37672#issuecomment-3938825406
> Fix GroupBy snippet tests and re-enable skipped assertions (fixes #30778)
The GroupBy snippet tests in groupby_test.py were permanently skipped via
skip_due_to_30778 = True. This PR fixes the underlying issues in the snippet
files so all 7 tests pass. Root causes fixed:
>
> Missing strawberry entry in GROCERY_LIST across all snippet files caused
incorrect aggregate results if test block was outside the with beam.Pipeline()
context, making grouped inaccessible beam.Map(print) was chained inside the
pipeline, polluting the snippet and breaking test assertions
check_simple_aggregate_result was applying beam.MapTuple(normalize_kv) to Row
objects returned by aggregate_field, which expects KV pairs
>
> Files changed:
>
> groupby_attr.py groupby_attr_expr.py groupby_expr.py
groupby_expr_aggregate.py groupby_global_aggregate.py
groupby_simple_aggregate.py groupby_two_exprs.py groupby_test.py
>
> Testing: pytest -q
sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py
7 passed in 0.99s
>
> addresses #30778 Update CHANGES.md with noteworthy changes.
Hi @MansiSingh17! Great work identifying and fixing the root causes. Your
approach is correct. I was also working on this issue and reviewed your PR
carefully. I noticed a few things that might need attention:
1. **`skip_due_to_30778` flag**: It doesn't look like the skip flag in
`groupby_test.py` has been removed. If it's still set to `True`, the tests will
continue to be skipped and won't actually run, which means the fix can't be
verified by CI.
2. **`groupby_attr_expr.py`: unconditional `beam.Map(print)`**... The
`grouped | beam.Map(print)` line runs unconditionally, even during tests. It
should be inside an `else` block like the other files:
```python
if test:
test(grouped)
else:
grouped | beam.Map(print)
```
3. **Removed `[START/END]` snippet markers**: In `groupby_attr_expr.py` and
`groupby_two_exprs.py`, the `# [START ...]` and `# [END ...]` markers were
removed. These are used by the Apache Beam documentation pipeline to extract
code snippets for the website, so removing them could break doc generation.
4. **`groupby_simple_aggregate.py`: unnecessary `to_grocery_row()` helper**.
Since `GROCERY_LIST` already contains `beam.Row` objects, the added
`to_grocery_row()` conversion function is redundant and adds unnecessary
complexity to what is meant to be a readable example snippet.
Happy to help address any of these if useful! Great effort overall on
tackling this. 🙌
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]