mtauha opened a new pull request, #37675:
URL: https://github.com/apache/beam/pull/37675

   ## Description
   
   This PR adds a native Python `GenerateSequence` bounded PTransform to the
   Python SDK, equivalent to the Java SDK's `GenerateSequence` (formerly known
   as `CountingInput`).
   
   Addresses #18088
   
   ### Motivation
   
   The Python SDK previously had no native equivalent of Java's 
`GenerateSequence`
   / `CountingInput` transform. The only existing Python implementation
   (`apache_beam/io/external/generate_sequence.py`) requires a Java expansion
   service and only works with the Flink runner, making it inaccessible to most
   Python users.
   
   This PR introduces a pure Python implementation that works on **all runners**
   (DirectRunner, Dataflow, etc.) without any Java dependency.
   
   ### Changes
   
   - Added `sdks/python/apache_beam/io/generate_sequence.py`:
     - `GenerateSequence` — a `PTransform` that produces a bounded sequence
       of integers from `start` (inclusive) to `stop` (exclusive)
     - `_BoundedCountingSource` — a `BoundedSource` backed by
       `OffsetRangeTracker`, supporting efficient splitting and dynamic
       work rebalancing across workers
   - Added `sdks/python/apache_beam/io/generate_sequence_test.py` with unit
     tests covering basic usage, edge cases, splitting behaviour, and
     size estimation
   
   ### Notes
   
   - This is **Phase 1 (bounded only)**. Unbounded streaming support with
     rate limiting will follow in a separate PR.
   - The existing external Flink-only version at
     `apache_beam/io/external/generate_sequence.py` is **untouched**.
   - Implementation is modelled after the Java `CountingSource.java` and
     follows the same `BoundedSource` + `OffsetRangeTracker` pattern used
     by other Python SDK IO sources.
   
   ### Testing
   
   ```bash
   cd sdks/python
   python -m pytest apache_beam/io/generate_sequence_test.py -v


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to