GavinRay97 commented on issue #30:
URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1133730265

   > To be transparent, my team is building a query engine which is sensitive 
to time-to-first-result latency so we are very interested in fully streaming 
execution (and hoping to upstream as much as we can) but want to make sure that 
this is in line for the desired direction of Ballista for the rest of the 
community.
   
   I also have major usecases for latency-sensitive, potentially-multisource 
queries.
   It boils down to being able to use it for end-user/interactive applications
   
   One of the biggest bummers to me about Spark is that its architecture 
cripples it for latency-sensitive workloads
   I wanted to see what the latency was like to do a basic, two-DB join query 
between in-memory databases:
   - 
https://github.com/GavinRay97/spark-playground/blob/44a756acaee676a9b0c128466e4ab231a7df8d46/src/main/scala/Application.scala#L80-L112
   
   Something like:
   ```
   SELECT ... FROM db1.foo JOIN db2.bar ON ... LIMIT 1
   ```
   
   Using the latest Spark nightly snapshot, this takes 150-200ms on my personal 
machine.
   
   A significant portion of this is spent on things relevant to multi-node 
computation but not required for doing in-memory on a single node 
(serialization, broadcasts, scheduling/coordination)
   
   The codegen + execution time isn't that bad
   
   Understandably Spark isn't tailored for this. But there's a lot of great 
technology in there (Catalyst, Tungsten) that are state-of-the-art for query 
optimization and performance, and it's a bummer that you can't configure Spark 
(to my knowledge) for a "local" mode or directly interact with just the pieces 
you need to manually evaluate expressions/do query optimization.
   
   Would be great if the future of Ballista accommodated for this. Opens up 
interesting possibilities.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to