houqp commented on issue #1273: URL: https://github.com/apache/arrow-datafusion/issues/1273#issuecomment-966013277
> The trouble I am running into is that DataFusion might have too much functionality. My csv files are already split up (many per process), and I already have processes running on an existing cluster via MPI. So I want to execute SQL queries once for each csv file and create a new result dataset distributed the same way as the original. @frobnitzem I think this is a bit off topic and worth to be discussed in a separate issue. To answer your question, you can just use datafusion as a simple library to query a single csv file in process using sql. You don't have to use ballista. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org