TheNeuralBit commented on a change in pull request #11766: URL: https://github.com/apache/beam/pull/11766#discussion_r434916844
########## File path: sdks/python/apache_beam/dataframe/expressions.py ########## @@ -85,16 +87,10 @@ def evaluate_at(self, session): # type: (Session) -> T """Returns the result of self with the bindings given in session.""" raise NotImplementedError(type(self)) - def requires_partition_by_index(self): # type: () -> bool - """Whether this expression requires its argument(s) to be partitioned - by index.""" - # TODO: It might be necessary to support partitioning by part of the index, - # for some args, which would require returning more than a boolean here. + def requires_partition_by(self): # type: () -> Partitioning raise NotImplementedError(type(self)) - def preserves_partition_by_index(self): # type: () -> bool - """Whether the result of this expression will be partitioned by index - whenever all of its inputs are partitioned by index.""" + def preserves_partition_by(self): # type: () -> Partitioning Review comment: Ah makes sense. So perhaps "preserves" could be thought of as an upper bound on the partitioning of the output (similar to how "requires" is a lower bound on the partitioning of the input). It looks like every current expression has preserves set to either Nothing or Singleton. Wouldn't it be simpler to just keep preserves as a boolean? Or maybe you have some other expression in mind where a boolean won't be sufficient? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org