tustvold commented on a change in pull request #2133:
URL: https://github.com/apache/arrow-datafusion/pull/2133#discussion_r841077605
##########
File path: docs/source/specification/quarterly_roadmap.md
##########
@@ -21,52 +21,65 @@
A quarterly roadmap will be published to give the DataFusion community
visibility into the priorities of the projects contributors. This roadmap is
not binding.
-## 2022 Q1
+## 2022 Q2
### DataFusion Core
-- Publish official Arrow2 branch
-- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+- IO Improvements
Review comment:
Not entirely sure what this specifically is referring to, but I
definitely intend to focus on improving the IO and scheduling stories in
arrow-rs and DataFusion. See https://github.com/apache/arrow-rs/issues/1473 and
https://github.com/apache/arrow-datafusion/issues/2079. Not sure if we want to
explicitly call out the scheduling side of this.
I may also get to proper filter pushdown to parquet if I have time -
https://github.com/apache/arrow-rs/issues/1191
Edit: I've proposed a change with a very high-level statement of what I hope
to achieve w.r.t scheduling
##########
File path: docs/source/specification/quarterly_roadmap.md
##########
@@ -21,52 +21,65 @@
A quarterly roadmap will be published to give the DataFusion community
visibility into the priorities of the projects contributors. This roadmap is
not binding.
-## 2022 Q1
+## 2022 Q2
### DataFusion Core
-- Publish official Arrow2 branch
-- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+- IO Improvements
+ - Reading, registering, and writing more file formats from both DataFrame
API and SQL
+ - Additional options for IO including partitioning and metadata support
+- Memory Management
Review comment:
```suggestion
- Work Scheduling
- Improve predictability, observability and performance of IO and
CPU-bound work
- Develop a more explicit story for managing parallelism during plan
execution
- Memory Management
```
I've yet to create a ticket for this, as I'm still exploring the problem
domain, but the precursor discussions can be found
https://github.com/apache/arrow-rs/issues/1473 and
https://github.com/apache/arrow-datafusion/issues/2079.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]