alamb commented on issue #12357: URL: https://github.com/apache/datafusion/issues/12357#issuecomment-2345936261
> DataFusion has a lot of really excellent foundational engineering. How it's used by so many downstream DB engines attests strongly to that. I think it's a real shame that it isn't quite as suitable for the role that pandas/dask/polars/duckdb currently occupies. This isn't due to anything lacking in the query engine, but the overall user experience for a direct user isn't quite as solid (as opposed to someone using it as a library). Thank you @kszlim -- This is well stated, and I think this is one of the core tensions that has existed in the project from the early days One way to go is as you suggest and try and make datafusion the superset of all that is good about polars (python dataframes) and duckdb (sql). I worry that this will result in an even larger library that isn't as good as either. Another potential way is to keep the core focused on fundamentals and work to provide open source alternatives to those other libraries *built on* datafusion. It is my not-so-secret goal with the following discussions: * `polars`: https://github.com/apache/datafusion-python/issues/440 (🙌 @timsaucer ) * `duckdb`: https://github.com/apache/datafusion/issues/11979 (🙌 @matthewmturner ) I am hopeing to see datafusion-python (or maybe a library built on datafusion-python) and `dft` evolve into delightful end user experiences. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
