Sounds interesting as we wanted to start using DataFusion. Btw, I vaguely remember that in the original repository you had issue like "investigate DataFusion with Gandiva", I'm curious why you have decided to give up with it?
On Thu, Aug 13, 2020 at 5:11 PM Andy Grove <andygrov...@gmail.com> wrote: > > Some of you may have noticed a sudden flurry of activity from me after a > bit of a break from the project, so I thought it might be useful to explain > what I am up to. > > As of 1.0.0, DataFusion isn't really useful against any real-world data > sets for a number of reasons, but most of all due to the simplistic > threading/partitioning model. There are a few small bugs as well. > > My current focus is to be able to run TPC-H query 1 against decent size > datasets (starting with the 100 GB dataset) with hundreds of partitions. I > believe that I can get this working with some fairly small changes. Later, > we can experiment with more advanced threading models and async, using the > same benchmark to measure improvements. > > Let me know if you have any questions. > > Thanks, > > Andy. -- Best regards, Kirill Lykov