paul-rogers edited a comment on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1007611673


   @jnturton,  one could do something like what you described. However, to have 
all of Drill work with Arrow would be a huge amount of work. Optimizations made 
for one format would be sub-optimal for the other. (Example: exchanges.) 
Furthermore, your use case would benefit from vectors only in the project and 
grouping operators.
   
   So, I wonder if we might think about the problem operator-by-operator. If 
you have a compute-heavy phase, might that first transform data to vectors, 
apply the compute, then send data along in row format? Every fragment does a 
network exchange: data is read/written anyway. So, perhaps there is something 
that can be done to transform formats at fragment boundaries (he says, waving 
hands wildly...)
   
   You'll also get speed only for queries without joins. If you have joins, 
then the joins are likely to take the vast amount of the runtime, leaving your 
projection and grouping in the noise. I'm not sure how vectorization can help 
joins; certainly in Drill today, vectors make the join code atrociously complex.
   
   This is why DBs (and compiler optimizers) are hard: the answers change based 
on use case...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to