uranusjr commented on issue #59294: URL: https://github.com/apache/airflow/issues/59294#issuecomment-3942855922
OK got some time thinking this through. 1. We should support one-to-many mapping (i.e. allow `to_downstream` to return more than one downstream partition key from one upstream key) since there is use case. 2. _What_ keys exactly are returned from one upstream key needs to be user-configurable, and we’re not going to do it for 3.2. (We’re not going to officially support user-defined PartitionMapper yet.) 3. Therefore, let’s not support one-to-many mapping for now since that’d lead to confusion if the exact keys emitted do not meet user expectation. 4. However, we should support `to_downstream` to return _either_ an iterable or a plain value to avoid user error and simplify simple cases. Things could be difficult to debug if only iterables are supported because it can only be validated at runtime when the scheduler is running. So I think what I’m going to do is to not make any change in PartitionMapper for now, but leave a todo note where it’s called (for creating dag runs) so we can handle more than one value (any non-str iterable) in the future. When we do, we probably also need to add a configuration to cap the upper limit of value returned to avoid a PartitionMapper clugging up the scheduler (by e.g. returning an infinite iterator). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
