houqp commented on issue #1273:
URL: 
https://github.com/apache/arrow-datafusion/issues/1273#issuecomment-966013277


   > The trouble I am running into is that DataFusion might have too much 
functionality. My csv files are already split up (many per process), and I 
already have processes running on an existing cluster via MPI. So I want to 
execute SQL queries once for each csv file and create a new result dataset 
distributed the same way as the original.
   
   @frobnitzem I think this is a bit off topic and worth to be discussed in a 
separate issue. To answer your question, you can just use datafusion as a 
simple library to query a single csv file in process using sql. You don't have 
to use ballista.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to