oersted opened a new issue, #4889: URL: https://github.com/apache/arrow-datafusion/issues/4889
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** At present, it is not very ergonomic to compose SQL. Say one wants to design a non-trivial pipeline with multiple stages by composing functions that perform various transformations on a DataFrame. This is only practical with the DataFrame API right now, by passing around partially transformed DataFrames and applying more transformations at each stage. **Describe the solution you'd like** Simply the ability to run SQL on an existing DataFrame (`DataFrame::sql`), so that a user always has the option to choose between SQL and the DataFrame API in more complex pipelines. I'd suggest registering a temporary table reference with a name like `self`. **Describe alternatives you've considered** It is might be technically possible to do this by registering intermediate views. However, * This would only work by staying within SQL in the whole pipeline, since there doesn't seem to be an API to create a view of a DataFrame either. * It would require passing around a reference to `SessionContext` everywhere. * Naming intermediate views, making sure they are globally unique, and passing around the names between functions as reference, which can be quite error-prone. * Dropping (garbage collecting) views when they are no longer needed. To be fair, other similar query engines do not have support for this either and have a similar behaviour. In Spark, there is the `DataFrame.createGlobalTempView` method, which is a bit more helpful but still means dealing with globally unique names. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
