mustafasrepo commented on PR #10095:
URL: https://github.com/apache/datafusion/pull/10095#issuecomment-2074203264

   > > Maybe we can insert RepartitionExec on top UnionExecs if their output 
partition number > config.target_partitions. By this way, we can guarantee this 
violation wouldn't propagate to other operators.
   > 
   > Probably better solution would be planning union inputs execution 
according to total available partitions -- e.g
   > 
   > ```sql
   >     select l_linenumber as f
   >     from lineitem
   >     union all
   >     select l_orderkey as f
   >     from lineitem
   > ```
   > 
   > with target_partitions = 4, could plan 2 threads for each ParquetExec 
(ideally we could also use byte/row statistics and plan according to them -- 
not only 2-2, but probably 1-3 if there is significant data skew across 
inputs/files).
   > 
   > Currently, with target_partitions = 4, it's planned as 4 threads per 
ParquetExec, and 8 output partitions for UNION.
   > 
   > And on top of it, when target_partitions is less then number of UNION 
inputs (e.g. UNION has 10 inputs, target_partitions = 4, and we need at least 1 
thread for each input) there could be RepartitionExec.
   
   That might work. However, this approach cannot solve all cases I guess. For 
the following query
    ```sql
   select * from table
   union all
   select * from table
   union all 
   select * from table
    ```sql
   when `target_partitions=2`, there is no way to change input partitions to 
generate 2 partitions after `union`. We would generate at least `3` 
partitions(assuming each input generates single partition). Hence, your 
suggestion may not solve all cases. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to