andygrove commented on pull request #8029:
URL: https://github.com/apache/arrow/pull/8029#issuecomment-678784408


   > This is an impressive simplification and improvement. Really great work, 
@andygrove !
   > 
   > I went through it and could not find any issue with it, only benefits.
   > 
   > To make sure I understood this:
   > 
   >     * scans infer the number of partitions based on the number of files 
they scan
   > 
   >     * the number of partitions is annotated in 
`Partitioning::UnknownPartitioning(N)`
   > 
   >     * the number of partitions is exposed to others via 
`output_partitioning`
   > 
   >     * others use `output_partitioning` to know up to which partition 
number they need to run against to run through all input partitions.
   > 
   > 
   > The invariant is that the maximum number that `execute(...)` accepts 
equals to the number exposed in `output_partitioning`, i.e. if a plan passes 
through that number, a `panic!` happens.
   
   Yes, your understanding is correct on all of these points.
   
    
   > I could think of an alternative approach where a plan outputs an iterator 
over partitions, so that others can iterate against (which would avoid us 
having to guarantee the invariant mentioned above through discipline), but you 
probably already though through this alternative :)
   
   Typically, we want to try and execute the partitions in parallel, either on 
separate threads in a single process or in separate processes in a cluster. We 
may want to use `async` for the execute method in the future.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to