[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

GitBox Wed, 05 Jan 2022 21:04:09 -0800


jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1006287333



   Okay, @paul-rogers I've had a few swigs of the kool aid by now and I think 
I'm ready to forget about in-memory column orientation and SIMD in return for 
the benefits of row orientation.  For workflows that do involve bulk arithmetic 
I can imagine good interop taking care of that stage:
   
   1. Do some efficient parsing, filtering, sorting, aggregating in Drill
   2. Smoothly switch over to Pandas/Numpy (perhaps an Arrow exporter?) or 
Julia or ...
   3. Do bulk arithmetic using SIMD
   4. Store results or smoothly switch back to Drill
   
   I've used this workflow myself where the data interchange format was Parquet 
and the transport medium was the DFS (so perhaps a bit more "clunky" than 
"smooth", with lots of serialisation and IO incurred).
   
   Going further, if the decoupling of Drill from its in-memory format 
mentioned above is a real possibility then can we even imagine something like 
this, entirely in Drill?
   
   ```
   alter session set exec.memory_format = 'drill'; -- the default, row-oriented 
format
   
   create table as select ... -- do some efficient parsing, filtering, sorting, 
aggregating in Drill
   create table as select ... -- do some efficient parsing, filtering, sorting, 
aggregating in Drill
   
   alter session set exec.memory_format = 'arrow'; -- switch to Arrow format
   
   create table as select ... do some bulk arithmetic using SIMD
   create table as select ... do some bulk arithmetic using SIMD
   ```
   
   To my mind Drill 2.0 would not try to ship support for the latter, Arrow 
format, merely make design decisions which leave that door open for a motivated 
developer...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Reply via email to