cisaacstern commented on PR #27618:
URL: https://github.com/apache/beam/pull/27618#issuecomment-1788145104

   > GroupByKey is failing in the tests under certain conditions which I 
suspect may have to do with [window 
merging](https://beam.apache.org/contribute/runner-guide/#window-merging) not 
working correctly.
   
   Confirming that forcing a single partition resolves all of the GBK-related 
failures in the tests, i.e.
   
   ```diff
   class Create(DaskBagOp):
     """The beginning of a Beam pipeline; the input must be `None`."""
     def apply(self, input_bag: OpInput) -> db.Bag:
       assert input_bag is None, 'Create expects no input!'
       original_transform = t.cast(_Create, self.transform)
       items = original_transform.values
   -   return db.from_sequence(items)
   +   return db.from_sequence(items, npartitions=1)
   ```
   I am still wrapping my mind around the relationship between DaskRunner 
_windows_ and dask bag _partitions_... but I think the latter is the inner 
implementation of the former? If so, then I think it's fair to say that the 
remaining work here is, as guessed above, to implement window (i.e., partition) 
merging here. Continuing on that now...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to