[arrow-datafusion] branch master updated: Add Roadmap to Documentation (#1104)

alamb Tue, 19 Oct 2021 09:25:38 -0700

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git



The following commit(s) were added to refs/heads/master by this push:
     new ff243a4  Add Roadmap to Documentation (#1104)
ff243a4 is described below

commit ff243a40e84a0bf86b69c976ff0ed317fae6df64
Author: Andrew Lamb <[email protected]>
AuthorDate: Tue Oct 19 12:25:26 2021 -0400

    Add Roadmap to Documentation (#1104)
    
    * Add Roadmap
    
    * Fix English, add comments from xudong963
    
    * Add datafusion-cli thoughts
    
    * add more links
    
    * Apply suggestions from code review
    
    Co-authored-by: Loïc Sharma <[email protected]>
    Co-authored-by: QP Hou <[email protected]>
    
    * Incorporate comments from QP Hou
    
    * prettier
    
    * Update docs/source/specification/roadmap.md
    
    Co-authored-by: Daniël Heres <[email protected]>
    
    * Apply suggestions from code review
    
    Co-authored-by: Carlos <[email protected]>
    Co-authored-by: rdettai <[email protected]>
    
    Co-authored-by: Loïc Sharma <[email protected]>
    Co-authored-by: QP Hou <[email protected]>
    Co-authored-by: Daniël Heres <[email protected]>
    Co-authored-by: Carlos <[email protected]>
    Co-authored-by: rdettai <[email protected]>
---
 README.md                            |  4 ++
 docs/source/index.rst                |  1 +
 docs/source/specification/roadmap.md | 99 ++++++++++++++++++++++++++++++++++++
 3 files changed, 104 insertions(+)

diff --git a/README.md b/README.md
index 458f197..e1f96f0 100644
--- a/README.md
+++ b/README.md
@@ -356,6 +356,10 @@ are mapped to Arrow types according to the following table
 | `CUSTOM`      | _Not yet supported_               |
 | `ARRAY`       | _Not yet supported_               |
 
+# Roadmap
+
+Please see [Roadmap](docs/source/specification/roadmap.md) for information of 
where the project is headed.
+
 # Architecture Overview
 
 There is no formal document describing DataFusion's architecture yet, but the 
following presentations offer a good overview of its different components and 
how they interact together.
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 6956d0b..bf6b250 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -52,6 +52,7 @@ Table of content
    :maxdepth: 1
    :caption: Specification
 
+   specification/roadmap
    specification/invariants
    specification/output-field-name-semantic
 
diff --git a/docs/source/specification/roadmap.md 
b/docs/source/specification/roadmap.md
new file mode 100644
index 0000000..520815b
--- /dev/null
+++ b/docs/source/specification/roadmap.md
@@ -0,0 +1,99 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Roadmap
+
+This document describes high level goals of the DataFusion and
+Ballista development community. It is not meant to restrict
+possibilities, but rather help newcomers understand the broader
+context of where the community is headed, and inspire
+additional contributions.
+
+DataFusion and Ballista are part of the [Apache
+Arrow](https://arrow.apache.org/) project and governed by the Apache
+Software Foundation governance model. These projects are entirely
+driven by volunteers, and we welcome contributions for items not on
+this roadmap. However, before submitting a large PR, we strongly
+suggest you start a coversation using a github issue or the
[email protected] mailing list to make review efficient and avoid
+surprises.
+
+# DataFusion
+
+DataFusion's goal is to become the embedded query engine of choice
+for new analytic applications, by leveraging the unique features of
+[Rust](https://www.rust-lang.org/) and [Apache 
Arrow](https://arrow.apache.org/)
+to provide:
+
+1. Best-in-class single node query performance
+2. A Declarative SQL query interface compatible with PostgreSQL
+3. A Dataframe API, similar to those offered by Pandas and Spark
+4. A Procedural API for programatically creating and running execution plans
+5. High performance, data race free, erogonomic extensibility points at at 
every layer
+
+## Additional SQL Language Features
+
+- Complete support list on 
[status](https://github.com/apache/arrow-datafusion/blob/master/README.md#status)
+- Timestamp Arithmetic 
[#194](https://github.com/apache/arrow-datafusion/issues/194)
+- SQL Parser extension point 
[#533](https://github.com/apache/arrow-datafusion/issues/533)
+- Support for nested structures (fields, lists, structs) 
[#119](https://github.com/apache/arrow-datafusion/issues/119)
+- Remaining Set Operators (`INTERSECT` / `EXCEPT`) 
[#1082](https://github.com/apache/arrow-datafusion/issues/1082)
+- Run all queries from the TPCH benchmark (see 
[milestone](https://github.com/apache/arrow-datafusion/milestone/2) for more 
details)
+
+## Query Optimizer
+
+- Additional constant folding / partial evaluation 
[#1070](https://github.com/apache/arrow-datafusion/issues/1070)
+- More sophisticated cost based optimizer for join ordering
+- Implement advanced query optimization framework (Tokomak) #440
+
+## Datasources
+
+- Better support for reading data from remote filesystems (e.g. S3) without 
caching it locally 
[#907](https://github.com/apache/arrow-datafusion/issues/907) 
[#1060](https://github.com/apache/arrow-datafusion/issues/1060)
+- Support for partitioned datasources 
[#1139](https://github.com/apache/arrow-datafusion/issues/1139) and make the 
integration of other table formats (Delta, Iceberg...) simpler
+- Improve performances of file format datasources (parallelize file listings, 
async Arrow readers, file chunk prefetching capability...)
+
+## Runtime / Infrastructure
+
+- Migrate to some sort of arrow2 based implementation (see 
[milestone](https://github.com/apache/arrow-datafusion/milestone/3) for more 
details)
+- Add DataFusion to h2oai/db-benchmark 
[147](https://github.com/apache/arrow-datafusion/issues/147)
+- Improve build time 
[348](https://github.com/apache/arrow-datafusion/issues/348)
+
+## Resource Management
+
+- Finer grain control and limit of runtime memory 
[#587](https://github.com/apache/arrow-datafusion/issues/587) and CPU usage 
[#54](https://github.com/apache/arrow-datafusion/issues/64)
+
+## Python Interface
+
+TBD
+
+## DataFusion CLI (`datafusion-cli`)
+
+Note: There are some additional thoughts on a datafusion-cli vision on 
[#1096](https://github.com/apache/arrow-datafusion/issues/1096#issuecomment-939418770).
+
+- Better abstraction between REPL parsing and queries so that commands are 
separated and handled correctly
+- Connect to the `Statistics` subsystem and have the cli print out more stats 
for query debugging, etc.
+- Improved error handling for interactive use and shell scripting usage
+- publishing to apt, brew, and possible NuGet registry so that people can use 
it more easily
+- adopt a shorter name, like dfcli?
+
+## Ballista
+
+# Vision
+
+TBD

[arrow-datafusion] branch master updated: Add Roadmap to Documentation (#1104)

Reply via email to