alamb commented on issue #1916: URL: https://github.com/apache/arrow-datafusion/issues/1916#issuecomment-1065943124
> Also, as a standalone system, Ballista will compete with the heavy weights in the category (Spark, Presto..). That is an interesting but very ambitious goal 😄 DataFusion is not JVM based, which could be an interesting differentiator. I think making a generic embedded distributing framework will be challenging as there are so many differing dimensions to consider (catalog structure, local caching, etc) that may be different Comparatively I think a singe node column oriented analytic query engine is a fairly well understood pattern (though I do think the DataFusion implementation is very good :bowtie: ) One thing I personally hope is that Ballista drives features into DataFusion so that making a new distributed engine using DataFusion becomes easier over time. Some examples of this technical flow I think are: 1. The extraction of `datafusion-proto` struct serialization by @carols10cents 2. The object store abstraction from @yjshen 3. The listing table provider from @rdettai 4. Making planning `async` 1. The work that @mingmwang is doing to enable intra-processc concurrency -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
