jnturton commented on issue #2421: URL: https://github.com/apache/drill/issues/2421#issuecomment-1006287333
Okay, @paul-rogers I've had a few swigs of the kool aid by now and I think I'm ready to forget about in-memory column orientation and SIMD in return for the benefits of row orientation. For workflows that do involve bulk arithmetic I can imagine good interop taking care of that stage: 1. Do some efficient parsing, filtering, sorting, aggregating in Drill 2. Smoothly switch over to Pandas/Numpy (perhaps an Arrow exporter?) or Julia or ... 3. Do bulk arithmetic using SIMD 4. Store results or smoothly switch back to Drill I've used this workflow myself where the data interchange format was Parquet and the transport medium was the DFS (so perhaps a bit more "clunky" than "smooth", with lots of serialisation and IO incurred). Going further, if the decoupling of Drill from its in-memory format mentioned above is a real possibility then can we even imagine something like this, entirely in Drill? ``` alter session set exec.memory_format = 'drill'; -- the default, row-oriented format create table as select ... -- do some efficient parsing, filtering, sorting, aggregating in Drill create table as select ... -- do some efficient parsing, filtering, sorting, aggregating in Drill alter session set exec.memory_format = 'arrow'; -- switch to Arrow format create table as select ... do some bulk arithmetic using SIMD create table as select ... do some bulk arithmetic using SIMD ``` To my mind Drill 2.0 would not try to ship support for the latter, Arrow format, merely make design decisions which leave that door open for a motivated developer... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org