loic-sharma commented on PR #2622: URL: https://github.com/apache/arrow-datafusion/pull/2622#issuecomment-1141474057
Hello, I've been watching DataFusion from the sidelines and am interested in using it in a Zig project. As a disclaimer, I'm new to Rust and am not that familiar with its async implementation. DataFusion's "query engine as a library" is similar to SQLite's "database as a library". SQLite's success is in part due to how easy it is to integrate it into *any* project, regardless of the language it uses. It'd be wonderful if DataFusion was also as easy to use in all projects! Today, a key challenge is Rust's async. While using blocking tricks is a great short-term solution, it is inefficient (it requires multiple threads) and can cause deadlock issues. Callbacks are not a perfect solution either (what if my language runtime requires stack unwinding?). In my opinion this would be helped by: 1. **An excellent C API** - Most languages have tooling to integrate with C, so, a C API makes it easier to use DataFusion in other languages. While in the short-term it's perfectly fine for the C API to be in a separate repository, in the long-term I'd recommend having the C API inside the arrow-datafusion repository. This would be force new DataFusion API designs to take the C API into consideration as well. 2. **Minimize async APIs** - Rust's async is a barrier to integrating with other languages. Ideally DataFusion should be able to query in-memory data using the caller's thread - similar to SQLite's threading model - without requiring async. DataFusion's APIs should be designed to avoid async unless absolutely necessary. Care should be taken to minimize the virality of async APIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
