uranusjr commented on issue #59294:
URL: https://github.com/apache/airflow/issues/59294#issuecomment-3942855922

   OK got some time thinking this through.
   
   1. We should support one-to-many mapping (i.e. allow `to_downstream` to 
return more than one downstream partition key from one upstream key) since 
there is use case.
   2. _What_ keys exactly are returned from one upstream key needs to be 
user-configurable, and we’re not going to do it for 3.2. (We’re not going to 
officially support user-defined PartitionMapper yet.)
   3. Therefore, let’s not support one-to-many mapping for now since that’d 
lead to confusion if the exact keys emitted do not meet user expectation.
   4. However, we should support `to_downstream` to return _either_ an iterable 
or a plain value to avoid user error and simplify simple cases. Things could be 
difficult to debug if only iterables are supported because it can only be 
validated at runtime when the scheduler is running.
   
   So I think what I’m going to do is to not make any change in PartitionMapper 
for now, but leave a todo note where it’s called (for creating dag runs) so we 
can handle more than one value (any non-str iterable) in the future. When we 
do, we probably also need to add a configuration to cap the upper limit of 
value returned to avoid a PartitionMapper clugging up the scheduler (by e.g. 
returning an infinite iterator).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to