[GitHub] [arrow-datafusion] tustvold commented on a change in pull request #2133: Update quarterly roadmap for Q2

GitBox Sat, 02 Apr 2022 07:14:11 -0700


tustvold commented on a change in pull request #2133:
URL: https://github.com/apache/arrow-datafusion/pull/2133#discussion_r841077605




##########
File path: docs/source/specification/quarterly_roadmap.md
##########
@@ -21,52 +21,65 @@
 
 A quarterly roadmap will be published to give the DataFusion community 
visibility into the priorities of the projects contributors. This roadmap is 
not binding.
 
-## 2022 Q1
+## 2022 Q2
 
 ### DataFusion Core
 
-- Publish official Arrow2 branch
-- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+- IO Improvements

Review comment:
       Not entirely sure what this specifically is referring to, but I 
definitely intend to focus on improving the IO and scheduling stories in 
arrow-rs and DataFusion. See https://github.com/apache/arrow-rs/issues/1473 and 
https://github.com/apache/arrow-datafusion/issues/2079. Not sure if we want to 
explicitly call out the scheduling side of this.
   
   I may also get to proper filter pushdown to parquet if I have time - 
https://github.com/apache/arrow-rs/issues/1191
   
   Edit: I've proposed a change with a very high-level statement of what I hope 
to achieve w.r.t scheduling

##########
File path: docs/source/specification/quarterly_roadmap.md
##########
@@ -21,52 +21,65 @@
 
 A quarterly roadmap will be published to give the DataFusion community 
visibility into the priorities of the projects contributors. This roadmap is 
not binding.
 
-## 2022 Q1
+## 2022 Q2
 
 ### DataFusion Core
 
-- Publish official Arrow2 branch
-- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+- IO Improvements
+  - Reading, registering, and writing more file formats from both DataFrame 
API and SQL
+  - Additional options for IO including partitioning and metadata support
+- Memory Management

Review comment:
       ```suggestion
   - Work Scheduling
     - Improve predictability, observability and performance of IO and 
CPU-bound work
     - Develop a more explicit story for managing parallelism during plan 
execution
   - Memory Management
   ```
   
   I've yet to create a ticket for this, as I'm still exploring the problem 
domain, but the precursor discussions can be found 
https://github.com/apache/arrow-rs/issues/1473 and 
https://github.com/apache/arrow-datafusion/issues/2079.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tustvold commented on a change in pull request #2133: Update quarterly roadmap for Q2

Reply via email to