chadrik commented on a change in pull request #11038:
URL: https://github.com/apache/beam/pull/11038#discussion_r422348125
##########
File path: sdks/python/apache_beam/io/iobase.py
##########
@@ -79,6 +82,23 @@
_LOGGER = logging.getLogger(__name__)
+
+class Position(Protocol):
+ def __lt__(self, other):
+ pass
+
+ def __le__(self, other):
+ pass
+
+ def __gt__(self, other):
+ pass
+
+ def __ge__(self, other):
+ pass
+
+
+PositionT = TypeVar('PositionT', bound='Position')
Review comment:
The `TypeVar` lets us make relationships between methods.
Consider this snippet of `RangeTracker`:
```python
class RangeTracker(Generic[PositionT]):
def start_position(self):
# type: () -> Optional[PositionT]
"""Returns the starting position of the current range, inclusive."""
raise NotImplementedError(type(self))
def stop_position(self):
# type: () -> Optional[PositionT]
"""Returns the ending position of the current range, exclusive."""
raise NotImplementedError(type(self))
```
What would happen if we used `Position` instead of `PositionT`? For
starters, we can't make it generic, since it's meaningless to make something
generic on a concrete type:
```python
class RangeTracker(object):
def start_position(self):
# type: () -> Optional[Position]
"""Returns the starting position of the current range, inclusive."""
raise NotImplementedError(type(self))
def stop_position(self):
# type: () -> Optional[Position]
"""Returns the ending position of the current range, exclusive."""
raise NotImplementedError(type(self))
```
The class above ensures that `start_position()` and `stop_position()` both
return a type that meets the `Position` protocol, but it doesn't ensure that
they return the _same_ type: could be a `str` for start and an `int` for stop.
We use a TypeVar to relate types within a function signature, as well as to
relate types between methods of a class.
The use-case for TypeVars is even more apparent on
`range_trackers.OrderedPositionRangeTracker` because it implements some
concrete methods that `PositionT`, whereas `RangeTracker` is entirely abstract.
One thing to note is that my use of `PositionT` on `BoundedSource.split()`
would be more useful if `SourceBundle` were `Generic`, as we could do this:
```python
def split(self,
desired_bundle_size, # type: int
start_position=None, # type: Optional[PositionT]
stop_position=None, # type: Optional[PositionT]
):
# type: (...) -> Iterator[SourceBundle[PositionT]]
```
Unfortunately, there's not an easy way to make a `NamedTuple` generic before
python 3.7.
Sorry this took so long to answer!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]