chadrik commented on a change in pull request #11038:
URL: https://github.com/apache/beam/pull/11038#discussion_r422348125



##########
File path: sdks/python/apache_beam/io/iobase.py
##########
@@ -79,6 +82,23 @@
 
 _LOGGER = logging.getLogger(__name__)
 
+
+class Position(Protocol):
+  def __lt__(self, other):
+    pass
+
+  def __le__(self, other):
+    pass
+
+  def __gt__(self, other):
+    pass
+
+  def __ge__(self, other):
+    pass
+
+
+PositionT = TypeVar('PositionT', bound='Position')

Review comment:
       The `TypeVar` lets us make relationships between methods.  
   
   Consider this snippet of `RangeTracker`:
   
   ```python
   class RangeTracker(Generic[PositionT]):
   
     def start_position(self):
       # type: () -> Optional[PositionT]
   
       """Returns the starting position of the current range, inclusive."""
       raise NotImplementedError(type(self))
   
     def stop_position(self):
       # type: () -> Optional[PositionT]
   
       """Returns the ending position of the current range, exclusive."""
       raise NotImplementedError(type(self))
   ```
   
    What would happen if we used `Position` instead of `PositionT`?   For 
starters, we can't make it generic, since it's meaningless to make something 
generic on a concrete type:
   
   ```python
   class RangeTracker(object):
   
     def start_position(self):
       # type: () -> Optional[Position]
   
       """Returns the starting position of the current range, inclusive."""
       raise NotImplementedError(type(self))
   
     def stop_position(self):
       # type: () -> Optional[Position]
   
       """Returns the ending position of the current range, exclusive."""
       raise NotImplementedError(type(self))
   ```
   
   The class above ensures that `start_position()` and `stop_position()` both 
return a type that meets the `Position` protocol, but it doesn't ensure that 
they return the _same_ type: could be a `str` for start and an `int` for stop.  
 We use a TypeVar to relate types within a function signature, as well as to 
relate types between methods of a class.
   
   The use-case for TypeVars is even more apparent on 
`range_trackers.OrderedPositionRangeTracker` because it implements some 
concrete methods that `PositionT`, whereas `RangeTracker` is entirely abstract.
   
   One thing to note is that my use of `PositionT` on `BoundedSource.split()` 
would be more useful if `SourceBundle` were `Generic`, as we could do this:
   
   ```python
     def split(self,
               desired_bundle_size,  # type: int
               start_position=None,  # type: Optional[PositionT]
               stop_position=None,  # type: Optional[PositionT]
              ):
       # type: (...) -> Iterator[SourceBundle[PositionT]]
   ```
   
   Unfortunately, there's not an easy way to make a `NamedTuple` generic before 
python 3.7.
   
   Sorry this took so long to answer!
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to