For Java/JVM there is also a discussion on user@ about dataframe libraries.
On Thu, Mar 18, 2021 at 5:47 AM Andrew Lamb <al...@influxdata.com> wrote: > The system you describe sounds quite cool. I don't know what is going on > the Java world -- as you say I think there is work a foot for technologies > similar in usecase to DataFusion in C++ (though I suspect the > implementation will be fairly different) > > > > On Wed, Mar 17, 2021 at 5:37 PM bobtins <bobti...@gmail.com> wrote: > > > I missed the talk but watched the video, which was fascinating. It helped > > me get the whole picture of what DataFusion does, which is impressive. In > > my previous job, I built a data analysis engine on a smaller scale in > Java, > > so some of the problems that DataFusion tackles are familiar to me. > > > > The initial implementation of my engine would load some data from a > > relational DB into a columnar memory store that I implemented (very much > > like Arrow); it would then perform various transformations analogous to > the > > logical plan in DataFusion (sort, group, filter, aggregate, etc), but > also > > supporting OLAP-like multi-level hierarchies and cubes. This query model > > didn't have a language itself; the UI manipulated an object model which > > contained the logical plan (although unfortunately the query model was > > tangled with other layers). > > > > This was later enhanced to generate SQL queries so you wouldn't have to > > load everything into memory, but you could do in-memory operations on top > > of the SQL result. I came up with an expression language close to SQL > which > > could be translated into in-memory or SQL operations. I had to do > something > > like the merge operator in DataFusion to support multi-stage aggregation > > (e.g. implement count(x) -> sum(count(x)), average(x) -> > > sum(sum(x))/sum(count(x)), etc. ). > > > > Like I said, my framework was nowhere near as heavy-duty as DataFusion + > > Arrow, but my familiarity with the power of in-memory columnar stores is > > what drew me to Arrow in the first place. > > > > I am curious about how the various language implementations in Arrow are > > evolving computation frameworks; for Rust, there is DataFusion, and I > > noticed that there has been a lot of work going on in C++/Python. For > Java, > > it seems like this would be in the realm of Gandiva or the dremio > > product...and of course there's Spark! I am still surveying the terrain, > > but any pointers to work people are doing in Java would be welcome. > > > > On 2021/03/12 19:39:16, Andrew Lamb <al...@influxdata.com> wrote: > > > Here are links to the content, should anyone be interested: > > > > > > Query Engine Design and the Rust-Based DataFusion in Apache Arrow > > > recording: https://www.youtube.com/watch?v=K6eCAVEk4kU > > > slides: (datafusion content starts on slide 6): > > > > > > https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934 > > > > > > On Thu, Mar 4, 2021 at 4:05 PM Andrew Lamb <al...@influxdata.com> > wrote: > > > > > > > In case anyone is interested in the topic in general or DataFusion in > > > > particular, I plan a tech talk [1] next week about "Query Engine > > Design and > > > > the Rust based DataFusion in Apache Arrow." > > > > > > > > If you are curious how (SQL) query engines in general are > structured, I > > > > plan to describe the typical high level architecture, using > DataFusion > > as > > > > an exemplar. > > > > > > > > It will be held next Wednesday, March 10, 2021 at 8:00 am PST | 4:00 > pm > > > > GMT, and posted publicly afterwards. > > > > > > > > Andrew > > > > > > > > [1] > https://www.influxdata.com/community-showcase/influxdb-tech-talks/ > > > > > > > > > > > > > >