cisaacstern commented on PR #27618: URL: https://github.com/apache/beam/pull/27618#issuecomment-1788145104
> GroupByKey is failing in the tests under certain conditions which I suspect may have to do with [window merging](https://beam.apache.org/contribute/runner-guide/#window-merging) not working correctly. Confirming that forcing a single partition resolves all of the GBK-related failures in the tests, i.e. ```diff class Create(DaskBagOp): """The beginning of a Beam pipeline; the input must be `None`.""" def apply(self, input_bag: OpInput) -> db.Bag: assert input_bag is None, 'Create expects no input!' original_transform = t.cast(_Create, self.transform) items = original_transform.values - return db.from_sequence(items) + return db.from_sequence(items, npartitions=1) ``` I am still wrapping my mind around the relationship between DaskRunner _windows_ and dask bag _partitions_... but I think the latter is the inner implementation of the former? If so, then I think it's fair to say that the remaining work here is, as guessed above, to implement window (i.e., partition) merging here. Continuing on that now... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
