tustvold commented on PR #5057:
URL: 
https://github.com/apache/arrow-datafusion/pull/5057#issuecomment-1404027622

   Only had time to take a brief look at this PR, and so I'm likely missing 
something but please bear with me :smile: 
   
   This PR modifies `ListingTable` to pair together `PartitionedFile` with 
`Vec<Option<FileRange>>`, this makes this approach specific to `ListingTable` 
and also adds parallelism control to a part of the system that doesn't really 
have context on how much parallelism is needed, nor what invariants such as 
sort orders may need to be upheld.
   
   I have two suggestions that may be stupid:
   
   * Make this a physical optimizer rule that looks at operators containing 
`FileScanConfig` and adds more partitions based on the `target_partitions` 
property
   * Rather than adding a new `FileRanges` property, instead using the existing 
`range: Option<FileRange>` already present on `PartitionedFile`, the same file 
with disjoint ranges can then appear in multiple partitions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to