TheNeuralBit commented on a change in pull request #11766:
URL: https://github.com/apache/beam/pull/11766#discussion_r435422511
##########
File path: sdks/python/apache_beam/dataframe/expressions.py
##########
@@ -85,16 +87,10 @@ def evaluate_at(self, session): # type: (Session) -> T
"""Returns the result of self with the bindings given in session."""
raise NotImplementedError(type(self))
- def requires_partition_by_index(self): # type: () -> bool
- """Whether this expression requires its argument(s) to be partitioned
- by index."""
- # TODO: It might be necessary to support partitioning by part of the index,
- # for some args, which would require returning more than a boolean here.
+ def requires_partition_by(self): # type: () -> Partitioning
raise NotImplementedError(type(self))
- def preserves_partition_by_index(self): # type: () -> bool
- """Whether the result of this expression will be partitioned by index
- whenever all of its inputs are partitioned by index."""
+ def preserves_partition_by(self): # type: () -> Partitioning
Review comment:
Ah that makes sense. And I guess the name "preserves" is actually
intuitive now that I understand it's setting an upper bound on the output
partitioning.
I think the complexity is worth it, unless there's a chance those operations
will never materialize. Can you just add a docstring indicating that
"preserves" sets an upper bound on the output partitioning (or any other
language to make sure readers can grok it)? A similar comment about requires
would be good too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]