[GitHub] [arrow-datafusion] realno edited a comment on issue #1916: Discussion: Is Ballista a standalone system or framework

GitBox Sun, 13 Mar 2022 14:54:51 -0700


realno edited a comment on issue #1916:
URL: 
https://github.com/apache/arrow-datafusion/issues/1916#issuecomment-1066188955



   > Also, as a standalone system, Ballista will compete with the heavy weights 
in the category (Spark, Presto..). That is an interesting but very ambitious 
goal 😄
   
   I feel some opportunities/differentiators for Ballista are the following:
   1. non-JVM - this brings a lot of benefit such as lower footprint, memory 
efficiency, and no GC cost (Rust specific)
   2. A chance for more modern design principles - for example Spark was 
originally architected to best deployed to bare metal, it is hard to make some 
changes to be more cloud friendly
   3. Utilize modern resource management and orchestration technologies - 
reusing mature tools like k8s will simplify Ballista's implementation (it 
probably doesn't need a very complex resource management system anymore) and 
integrate easily with modern systems (cloud native and simpler multi-tenancy)
   4. Using Arrow as the backbone opens doors for more advanced use case such 
as ML - it may be efficiently integrated with Pandas or Tensorflow through 
Arrow.
   
   We heavily use systems like Spark for Analytics and ML, the above points are 
pain points that worth consider switching. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] realno edited a comment on issue #1916: Discussion: Is Ballista a standalone system or framework

Reply via email to