comphead commented on PR #3349:
URL: 
https://github.com/apache/datafusion-comet/pull/3349#issuecomment-3862980021

   Thanks @mbutrovich checked the high level please correct me if I'm wrong so 
before all tasks being sent to executors
   
   Task example is below
   ```
   FileScanTask {
     - file_path: "/data/sales/date=2024-01-01/file-001.parquet"
     - schema: [columns and types]
     - partition_data: {date: "2024-01-01", region: "US"}
     - partition_spec: how data is partitioned
     - residual_filter: additional filters to apply
     - file_size: 128MB
     - record_count: 1,000,000 rows
     - delete_files: [any delete files to apply]
     - column_stats: min/max values, null counts
   }
   ```
   
   however some particular executor would use only few of transferred tasks? 
This PR makes some FileScanTask distribution between executors, only needed are 
sent? if so what is the distribution algorithm and do you envision any shuffle 
increase if executors reads not collocated data?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to