houqp commented on pull request #972: URL: https://github.com/apache/arrow-datafusion/pull/972#issuecomment-922396736
Sorry for the delay, still trying to catch up with all the interesting design discussions across different PRs. It looks like we all agree that it's better to drive the partitioning logic in table scan using `target_partition_size` (or `max_partition_size`) config. For this particular PR, I agree with @alamb that it is an improvement and doesn't make anything worse. I think the newly added ScanConfig also lays out a good foundation for us to add more scan config in the future including `target_partition_size` and potentially remove `target_partitions`. I also agree with @rdettai that we should avoid unnecessary API churn if possible. However, I do think the table `scan` method is the right place for us to pass in the `target_partition_size` config when we add it in the future. @rdettai do you envision us having to pass in the `target_partition_size` config at a different layer? If so, I can see the value of avoiding the API change in the TableProvider trait. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
