I wanted to share a quick update on the status of Ballista. Ballista is now capable of running some of the TPC-H benchmark queries against the 1TB data set. I documented the benchmark results for DataFusion, Ballista, and Apache Spark for reference, here:
https://github.com/andygrove/ballista-research/wiki/Ballista-benchmarks-results Here is a chart showing these initial results. As noted on the wiki page, I didn't put too much thought into the configurations used but they are at least documented. I now plan on iterating on my benchmark process to measure memory usage and show scalability as executors/cores are added for each solution. [image: sf1kcsv.png] Note that I hit failures with all three solutions and this is why there are columns missing from the chart. I'll try and get something more compelling written up for a blog post to coincide with the upcoming release of Ballista 0.5.0 but I figured folks might be interested in these informal interim results. Thanks, Andy.