rdettai commented on pull request #972:
URL: https://github.com/apache/arrow-datafusion/pull/972#issuecomment-918986641


   Thanks @alamb and @houqp for your insights and @xudong963 for quickly 
reacting to this feedback! But I am still not 100% convinced 😃
   - in the context of an engine like 
[Buzz](https://github.com/cloudfuse-io/buzz-rust), where the number of CPUs is 
meant to be fully elastic, I would prefer to specify a partition size and no 
target count. I understand that adding `target_parition_size` could be an 
evolution, but it bothers me that `target_partitions` is not optional because I 
wouldn't know what to specify for it
   - Spark currently accepts that no parallelism is hinted to the datasource, 
and in that case the datasource comes up with a partition count of its own. I 
find this behavior intuitive but it might be because I have been educated to do 
so 😄 
   > I think of target_partitions as "target concurrency" 
   - I would say that there isn't a 1 to 1 equivalence between parallelism and 
partition number. Usually, the partition number can be much larger than the 
parallelism and tasks for extra partitions will be queued. So if we mean to 
hint a "target concurrency" to the table providers, I think we should name this 
configuration as such.
   - this is a personal opinion, but I am usually septic of global parameters 
that are meant to be interpreted differently by different implementations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to