kszlim commented on issue #12357: URL: https://github.com/apache/datafusion/issues/12357#issuecomment-2345293094
One vote here for the other use case. I'd like datafusion to be usable as a single node query engine (alongside a nice dataframe api). This is in works within the datafusion-python bindings, but I'd personally love for this use case to gain as much priority as datafusion as a library to build other db products on top of. I really think with a combination of really strong python bindings (and ensuring that all extension points are also appropriately exposed to python), https://github.com/apache/datafusion/issues/4285, and a lot of work into making the docs and the python bindings as nice as polars. Datafusion could become *the* go to solution for ETL/OLAP/ML/data engineering/etc. use cases. DataFusion has a lot of really excellent foundational engineering. How it's used by so many downstream DB engines attests strongly to that. I think it's a real shame that it isn't quite as suitable for the role that pandas/dask/polars/duckdb currently occupies. This isn't due to anything lacking in the query engine, but the overall user experience for a direct user isn't quite as solid (as opposed to someone using it as a library). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
