I wanted to share a quick update on the status of Ballista.

Ballista is now capable of running some of the TPC-H benchmark queries
against the 1TB data set. I documented the benchmark results for
DataFusion, Ballista, and Apache Spark for reference, here:

https://github.com/andygrove/ballista-research/wiki/Ballista-benchmarks-results

Here is a chart showing these initial results. As noted on the wiki page, I
didn't put too much thought into the configurations used but they are at
least documented. I now plan on iterating on my benchmark process to
measure memory usage and show scalability as executors/cores are added for
each solution.
[image: sf1kcsv.png]
Note that I hit failures with all three solutions and this is why there are
columns missing from the chart.

I'll try and get something more compelling written up for a blog post to
coincide with the upcoming release of Ballista 0.5.0 but I figured folks
might be interested in these informal interim results.

Thanks,

Andy.

Reply via email to