[GitHub] [arrow-datafusion] andygrove opened a new issue #62: DataFrame.collect() should be extensible

GitBox Sun, 25 Apr 2021 06:45:41 -0700


andygrove opened a new issue #62:
URL: https://github.com/apache/arrow-datafusion/issues/62



   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Ballista provides its own execution context but uses the DataFusion 
DataFrame. Calling `collect` on the DataFrame will run the query in-memory 
rather than distributed and Ballista users must instead extract the logical 
plan from the DataFrame and call `BallistaContext.collect` instead. This is not 
good UX.
   
   **Describe the solution you'd like**
   As a user, I would just like to call `DataFrame.collect()` and have it run 
either in-memory or distributed depending on how I created the context.
   
   I think the way to do this is by making it possible to customize 
`ExecutionContext` and override the behavior when a DataFrame is collected.
   
   **Describe alternatives you've considered**
   None
   
   **Additional context**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] andygrove opened a new issue #62: DataFrame.collect() should be extensible

Reply via email to