[GitHub] [arrow-datafusion] alamb commented on issue #1916: Discussion: Is Ballista a standalone system or framework

GitBox Sat, 12 Mar 2022 10:58:50 -0800


alamb commented on issue #1916:
URL: 
https://github.com/apache/arrow-datafusion/issues/1916#issuecomment-1065943124



   > Also, as a standalone system, Ballista will compete with the heavy weights 
in the category (Spark, Presto..). That is an interesting but very ambitious 
goal 😄
   
   DataFusion is not JVM based, which could be an interesting differentiator.
   
   I think making a generic embedded distributing framework will be challenging 
as there are so many differing dimensions to consider (catalog structure, local 
caching, etc) that may be different
   
   Comparatively I think a singe node column oriented analytic query engine is 
a fairly well understood pattern (though I do think the DataFusion 
implementation is very good :bowtie: ) 
   
   One thing I personally hope is that Ballista drives features into DataFusion 
so that making a new distributed engine using DataFusion becomes  easier over 
time. 
   
   Some examples of this technical flow I think are: 
   1. The extraction of `datafusion-proto` struct serialization by 
@carols10cents 
   2. The object store abstraction from @yjshen 
   3. The listing table provider from @rdettai 
   4. Making planning `async` 
   1. The work that @mingmwang is doing to enable intra-processc concurrency
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #1916: Discussion: Is Ballista a standalone system or framework

Reply via email to