realno edited a comment on issue #1916: URL: https://github.com/apache/arrow-datafusion/issues/1916#issuecomment-1066188955
> Also, as a standalone system, Ballista will compete with the heavy weights in the category (Spark, Presto..). That is an interesting but very ambitious goal 😄 I feel some opportunities/differentiators for Ballista are the following: 1. non-JVM - this brings a lot of benefit such as lower footprint, memory efficiency, and no GC cost (Rust specific) 2. A chance for more modern design principles - for example Spark was originally architected to best deployed to bare metal, it is hard to make some changes to be more cloud friendly 3. Utilize modern resource management and orchestration technologies - reusing mature tools like k8s will simplify Ballista's implementation (it probably doesn't need a very complex resource management system anymore) and integrate easily with modern systems (cloud native and simpler multi-tenancy) 4. Using Arrow as the backbone opens doors for more advanced use case such as ML - it may be efficiently integrated with Pandas or Tensorflow through Arrow. We heavily use systems like Spark for Analytics and ML, the above points are pain points that worth consider switching. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
