This is an automated email from the ASF dual-hosted git repository. houqp pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/master by this push: new dcd34c6 consolidate datafusion docs with sphinx (#993) dcd34c6 is described below commit dcd34c667937472560ebdf29098489ef8cf62147 Author: QP Hou <q...@scribd.com> AuthorDate: Tue Sep 14 22:16:31 2021 -0700 consolidate datafusion docs with sphinx (#993) * consolidate datafusion docs with sphinx * added python doc * combined, cli, user-guide and specification docs into a single datafusion doc Co-authored-by: Jiayu Liu <jimex...@users.noreply.github.com> --- README.md | 2 +- datafusion/docs/cli.md | 102 ----------- docs/{user-guide/book.toml => .gitignore} | 9 +- docs/{user-guide/book.toml => Makefile} | 27 ++- docs/{user-guide => }/README.md | 16 +- docs/make.bat | 52 ++++++ docs/{user-guide/book.toml => requirements.txt} | 10 +- .../images/DataFusion-Logo-Background-White.png | Bin .../images/DataFusion-Logo-Background-White.svg | 0 .../_static}/images/DataFusion-Logo-Dark.png | Bin .../_static}/images/DataFusion-Logo-Dark.svg | 0 .../_static}/images/DataFusion-Logo-Light.png | Bin .../_static}/images/DataFusion-Logo-Light.svg | 0 docs/source/_static/theme_overrides.css | 93 ++++++++++ docs/source/_templates/docs-sidebar.html | 19 ++ docs/source/_templates/layout.html | 5 + docs/source/cli/index.rst | 113 ++++++++++++ docs/source/conf.py | 100 +++++++++++ docs/source/index.rst | 65 +++++++ docs/source/python/api.rst | 30 ++++ docs/source/python/api/dataframe.rst | 27 +++ docs/source/python/api/execution_context.rst | 27 +++ docs/source/python/api/expression.rst | 27 +++ docs/source/python/api/functions.rst | 27 +++ docs/source/python/index.rst | 192 +++++++++++++++++++++ docs/{ => source}/specification/invariants.md | 0 .../specification/output-field-name-semantic.md | 0 docs/{user-guide/src => source/user-guide}/cli.md | 8 +- .../user-guide/distributed/clients/index.rst | 25 +++ .../user-guide/distributed/clients/python.md} | 0 .../user-guide/distributed/clients/rust.md} | 2 +- .../distributed/deployment}/cargo-install.md | 2 +- .../distributed/deployment}/configuration.md | 0 .../distributed/deployment}/docker-compose.md | 0 .../user-guide/distributed/deployment}/docker.md | 2 +- .../user-guide/distributed/deployment/index.rst | 29 ++++ .../distributed/deployment}/kubernetes.md | 0 .../distributed/deployment}/raspberrypi.md | 0 docs/source/user-guide/distributed/index.rst | 26 +++ .../user-guide}/distributed/introduction.md | 2 +- .../src => source/user-guide}/example-usage.md | 0 docs/{user-guide/src => source/user-guide}/faq.md | 0 .../src => source/user-guide}/introduction.md | 2 +- .../src => source/user-guide}/library.md | 4 +- .../user-guide}/sql/datafusion-functions.md | 0 .../src => source/user-guide}/sql/ddl.md | 0 docs/source/user-guide/sql/index.rst | 26 +++ .../src => source/user-guide}/sql/select.md | 18 +- docs/user-guide/.gitignore | 1 - docs/user-guide/src/SUMMARY.md | 43 ----- docs/user-guide/src/distributed/clients.md | 23 --- docs/user-guide/src/distributed/deployment.md | 28 --- docs/user-guide/src/sql/introduction.md | 20 --- 53 files changed, 943 insertions(+), 261 deletions(-) diff --git a/README.md b/README.md index ed2788c..b9253cd 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ # DataFusion -<img src="datafusion/docs/images/DataFusion-Logo-Background-White.svg" width="256"/> +<img src="docs/source/_static/images/DataFusion-Logo-Background-White.svg" width="256"/> DataFusion is an extensible query execution framework, written in Rust, that uses [Apache Arrow](https://arrow.apache.org) as its diff --git a/datafusion/docs/cli.md b/datafusion/docs/cli.md deleted file mode 100644 index d62dcdd..0000000 --- a/datafusion/docs/cli.md +++ /dev/null @@ -1,102 +0,0 @@ -<!--- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. ---> - -# DataFusion CLI - -The DataFusion CLI is a command-line interactive SQL utility that allows queries to be executed against CSV and Parquet files. It is a convenient way to try DataFusion out with your own data sources. - -## Run using Cargo - -Use the following commands to clone this repository and run the CLI. This will require the Rust toolchain to be installed. Rust can be installed from [https://rustup.rs/](https://rustup.rs/). - -```bash -git clone https://github.com/apache/arrow-datafusion -cd arrow-datafusion/datafusion-cli -cargo run --release -``` - -## Run using Docker - -Use the following commands to clone this repository and build a Docker image containing the CLI tool. Note that there is `.dockerignore` file in the root of the repository that may need to be deleted in order for this to work. - -```bash -git clone https://github.com/apache/arrow-datafusion -cd arrow-datafusion -docker build -f datafusion-cli/Dockerfile . --tag datafusion-cli -docker run -it -v $(your_data_location):/data datafusion-cli -``` - -## Usage - -``` -DataFusion 4.0.0-SNAPSHOT -DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries -against CSV and Parquet files as well as querying directly against in-memory data. - -USAGE: - datafusion-cli [FLAGS] [OPTIONS] - -FLAGS: - -h, --help Prints help information - -q, --quiet Reduce printing other than the results and work quietly - -V, --version Prints version information - -OPTIONS: - -c, --batch-size <batch-size> The batch size of each query, or use DataFusion default - -p, --data-path <data-path> Path to your data, default to current directory - -f, --file <file> Execute commands from file, then exit - --format <format> Output format [default: table] [possible values: csv, tsv, table, json, ndjson] -``` - -Type `exit` or `quit` to exit the CLI. - -## Registering Parquet Data Sources - -Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is not necessary to provide schema information for Parquet files. - -```sql -CREATE EXTERNAL TABLE taxi -STORED AS PARQUET -LOCATION '/mnt/nyctaxi/tripdata.parquet'; -``` - -## Registering CSV Data Sources - -CSV data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is necessary to provide schema information for CSV files since DataFusion does not automatically infer the schema when using SQL to query CSV files. - -```sql -CREATE EXTERNAL TABLE test ( - c1 VARCHAR NOT NULL, - c2 INT NOT NULL, - c3 SMALLINT NOT NULL, - c4 SMALLINT NOT NULL, - c5 INT NOT NULL, - c6 BIGINT NOT NULL, - c7 SMALLINT NOT NULL, - c8 INT NOT NULL, - c9 BIGINT NOT NULL, - c10 VARCHAR NOT NULL, - c11 FLOAT NOT NULL, - c12 DOUBLE NOT NULL, - c13 VARCHAR NOT NULL -) -STORED AS CSV -WITH HEADER ROW -LOCATION '/path/to/aggregate_test_100.csv'; -``` diff --git a/docs/user-guide/book.toml b/docs/.gitignore similarity index 87% copy from docs/user-guide/book.toml copy to docs/.gitignore index efb9212..765c378 100644 --- a/docs/user-guide/book.toml +++ b/docs/.gitignore @@ -15,9 +15,6 @@ # specific language governing permissions and limitations # under the License. -[book] -authors = ["Apache Arrow"] -language = "en" -multilingual = false -src = "src" -title = "DataFusion User Guide" +build +source/python/generated +venv/ diff --git a/docs/user-guide/book.toml b/docs/Makefile similarity index 55% copy from docs/user-guide/book.toml copy to docs/Makefile index efb9212..6bce199 100644 --- a/docs/user-guide/book.toml +++ b/docs/Makefile @@ -15,9 +15,24 @@ # specific language governing permissions and limitations # under the License. -[book] -authors = ["Apache Arrow"] -language = "en" -multilingual = false -src = "src" -title = "DataFusion User Guide" +# +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/user-guide/README.md b/docs/README.md similarity index 71% rename from docs/user-guide/README.md rename to docs/README.md index 6698e56..4aa9ea9 100644 --- a/docs/user-guide/README.md +++ b/docs/README.md @@ -17,15 +17,19 @@ under the License. --> -# DataFusion User Guide Source +# DataFusion docs -This directory contains the sources for the DataFusion user guide. +## Dependencies -## Generate HTML +It's recommended to install build dependencies and build the the documentation +inside a Python virtualenv. -To generate the user guide in HTML format, run the following commands: +- Python +- `pip install -r requirements.txt` +- Datafusion python package. You can install the latest version by running `maturin develop` inside `../python` directory. + +## Build ```bash -cargo install mdbook -mdbook build +make html ``` diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 0000000..ded5b4a --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,52 @@ +@rem Licensed to the Apache Software Foundation (ASF) under one +@rem or more contributor license agreements. See the NOTICE file +@rem distributed with this work for additional information +@rem regarding copyright ownership. The ASF licenses this file +@rem to you under the Apache License, Version 2.0 (the +@rem "License"); you may not use this file except in compliance +@rem with the License. You may obtain a copy of the License at +@rem +@rem http://www.apache.org/licenses/LICENSE-2.0 +@rem +@rem Unless required by applicable law or agreed to in writing, +@rem software distributed under the License is distributed on an +@rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +@rem KIND, either express or implied. See the License for the +@rem specific language governing permissions and limitations +@rem under the License. + +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/user-guide/book.toml b/docs/requirements.txt similarity index 87% rename from docs/user-guide/book.toml rename to docs/requirements.txt index efb9212..0f18a11 100644 --- a/docs/user-guide/book.toml +++ b/docs/requirements.txt @@ -15,9 +15,7 @@ # specific language governing permissions and limitations # under the License. -[book] -authors = ["Apache Arrow"] -language = "en" -multilingual = false -src = "src" -title = "DataFusion User Guide" +sphinx==2.4.4 +pydata-sphinx-theme +myst-parser<1 +maturin<0.12 diff --git a/datafusion/docs/images/DataFusion-Logo-Background-White.png b/docs/source/_static/images/DataFusion-Logo-Background-White.png similarity index 100% rename from datafusion/docs/images/DataFusion-Logo-Background-White.png rename to docs/source/_static/images/DataFusion-Logo-Background-White.png diff --git a/datafusion/docs/images/DataFusion-Logo-Background-White.svg b/docs/source/_static/images/DataFusion-Logo-Background-White.svg similarity index 100% rename from datafusion/docs/images/DataFusion-Logo-Background-White.svg rename to docs/source/_static/images/DataFusion-Logo-Background-White.svg diff --git a/datafusion/docs/images/DataFusion-Logo-Dark.png b/docs/source/_static/images/DataFusion-Logo-Dark.png similarity index 100% rename from datafusion/docs/images/DataFusion-Logo-Dark.png rename to docs/source/_static/images/DataFusion-Logo-Dark.png diff --git a/datafusion/docs/images/DataFusion-Logo-Dark.svg b/docs/source/_static/images/DataFusion-Logo-Dark.svg similarity index 100% rename from datafusion/docs/images/DataFusion-Logo-Dark.svg rename to docs/source/_static/images/DataFusion-Logo-Dark.svg diff --git a/datafusion/docs/images/DataFusion-Logo-Light.png b/docs/source/_static/images/DataFusion-Logo-Light.png similarity index 100% rename from datafusion/docs/images/DataFusion-Logo-Light.png rename to docs/source/_static/images/DataFusion-Logo-Light.png diff --git a/datafusion/docs/images/DataFusion-Logo-Light.svg b/docs/source/_static/images/DataFusion-Logo-Light.svg similarity index 100% rename from datafusion/docs/images/DataFusion-Logo-Light.svg rename to docs/source/_static/images/DataFusion-Logo-Light.svg diff --git a/docs/source/_static/theme_overrides.css b/docs/source/_static/theme_overrides.css new file mode 100644 index 0000000..1e972cc --- /dev/null +++ b/docs/source/_static/theme_overrides.css @@ -0,0 +1,93 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + + +/* Customizing with theme CSS variables */ + +:root { + --pst-color-active-navigation: 215, 70, 51; + --pst-color-link-hover: 215, 70, 51; + --pst-color-headerlink: 215, 70, 51; + /* Use normal text color (like h3, ..) instead of primary color */ + --pst-color-h1: var(--color-text-base); + --pst-color-h2: var(--color-text-base); + /* Use softer blue from bootstrap's default info color */ + --pst-color-info: 23, 162, 184; + --pst-header-height: 0px; +} + +code { + color: rgb(215, 70, 51); +} + +.footer { + text-align: center; +} + +/* Ensure the logo is properly displayed */ + +.navbar-brand { + height: auto; + width: auto; +} + +a.navbar-brand img { + height: auto; + width: auto; + max-height: 15vh; + max-width: 100%; +} + + +/* This is the bootstrap CSS style for "table-striped". Since the theme does +not yet provide an easy way to configure this globaly, it easier to simply +include this snippet here than updating each table in all rst files to +add ":class: table-striped" */ + +.table tbody tr:nth-of-type(odd) { + background-color: rgba(0, 0, 0, 0.05); +} + + +/* Limit the max height of the sidebar navigation section. Because in our +custimized template, there is more content above the navigation, i.e. +larger logo: if we don't decrease the max-height, it will overlap with +the footer. +Details: min(15vh, 110px) for the logo size, 8rem for search box etc*/ + +@media (min-width:720px) { + @supports (position:-webkit-sticky) or (position:sticky) { + .bd-links { + max-height: calc(100vh - min(15vh, 110px) - 8rem) + } + } +} + + +/* Fix table text wrapping in RTD theme, + * see https://rackerlabs.github.io/docs-rackspace/tools/rtd-tables.html + */ + +@media screen { + table.docutils td { + /* !important prevents the common CSS stylesheets from overriding + this as on RTD they are loaded after this stylesheet */ + white-space: normal !important; + } +} diff --git a/docs/source/_templates/docs-sidebar.html b/docs/source/_templates/docs-sidebar.html new file mode 100644 index 0000000..bc2bf00 --- /dev/null +++ b/docs/source/_templates/docs-sidebar.html @@ -0,0 +1,19 @@ + +<a class="navbar-brand" href="{{ pathto(master_doc) }}"> + <img src="{{ pathto('_static/images/' + logo, 1) }}" class="logo" alt="logo"> +</a> + +<form class="bd-search d-flex align-items-center" action="{{ pathto('search') }}" method="get"> + <i class="icon fas fa-search"></i> + <input type="search" class="form-control" name="q" id="search-input" placeholder="{{ theme_search_bar_text }}" aria-label="{{ theme_search_bar_text }}" autocomplete="off" > +</form> + +<nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation"> + <div class="bd-toc-item active"> + {% if "python/api" in pagename or "python/generated" in pagename %} + {{ generate_nav_html("sidebar", startdepth=0, maxdepth=3, collapse=False, includehidden=True, titles_only=True) }} + {% else %} + {{ generate_nav_html("sidebar", startdepth=0, maxdepth=4, collapse=False, includehidden=True, titles_only=True) }} + {% endif %} + </div> +</nav> diff --git a/docs/source/_templates/layout.html b/docs/source/_templates/layout.html new file mode 100644 index 0000000..a9d0f30 --- /dev/null +++ b/docs/source/_templates/layout.html @@ -0,0 +1,5 @@ +{% extends "pydata_sphinx_theme/layout.html" %} + +{# Silence the navbar #} +{% block docs_navbar %} +{% endblock %} diff --git a/docs/source/cli/index.rst b/docs/source/cli/index.rst new file mode 100644 index 0000000..93ae173 --- /dev/null +++ b/docs/source/cli/index.rst @@ -0,0 +1,113 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +======================= +DataFusion Command-line +======================= + +The Arrow DataFusion CLI is a command-line interactive SQL utility that allows +queries to be executed against CSV and Parquet files. It is a convenient way to +try DataFusion out with your own data sources. + +Run using Cargo +=============== + +Use the following commands to clone this repository and run the CLI. This will require the Rust toolchain to be installed. Rust can be installed from `https://rustup.rs <https://rustup.rs/>`_. + +.. code-block:: bash + + git clone https://github.com/apache/arrow-datafusion + cd arrow-datafusion/datafusion-cli + cargo run --release + + +Run using Docker +================ + +Use the following commands to clone this repository and build a Docker image containing the CLI tool. Note that there is :code:`.dockerignore` file in the root of the repository that may need to be deleted in order for this to work. + +.. code-block:: bash + + git clone https://github.com/apache/arrow-datafusion + cd arrow-datafusion + docker build -f datafusion-cli/Dockerfile . --tag datafusion-cli + docker run -it -v $(your_data_location):/data datafusion-cli + + +Usage +===== + +.. code-block:: bash + + DataFusion 5.0.0-SNAPSHOT + DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries + against CSV and Parquet files as well as querying directly against in-memory data. + + USAGE: + datafusion-cli [FLAGS] [OPTIONS] + + FLAGS: + -h, --help Prints help information + -q, --quiet Reduce printing other than the results and work quietly + -V, --version Prints version information + + OPTIONS: + -c, --batch-size <batch-size> The batch size of each query, or use DataFusion default + -p, --data-path <data-path> Path to your data, default to current directory + -f, --file <file> Execute commands from file, then exit + --format <format> Output format [default: table] [possible values: csv, tsv, table, json, ndjson] + +Type `exit` or `quit` to exit the CLI. + + +Registering Parquet Data Sources +================================ + +Parquet data sources can be registered by executing a :code:`CREATE EXTERNAL TABLE` SQL statement. It is not necessary to provide schema information for Parquet files. + +.. code-block:: sql + + CREATE EXTERNAL TABLE taxi + STORED AS PARQUET + LOCATION '/mnt/nyctaxi/tripdata.parquet'; + + +Registering CSV Data Sources +============================ + +CSV data sources can be registered by executing a :code:`CREATE EXTERNAL TABLE` SQL statement. It is necessary to provide schema information for CSV files since DataFusion does not automatically infer the schema when using SQL to query CSV files. + +.. code-block:: sql + + CREATE EXTERNAL TABLE test ( + c1 VARCHAR NOT NULL, + c2 INT NOT NULL, + c3 SMALLINT NOT NULL, + c4 SMALLINT NOT NULL, + c5 INT NOT NULL, + c6 BIGINT NOT NULL, + c7 SMALLINT NOT NULL, + c8 INT NOT NULL, + c9 BIGINT NOT NULL, + c10 VARCHAR NOT NULL, + c11 FLOAT NOT NULL, + c12 DOUBLE NOT NULL, + c13 VARCHAR NOT NULL + ) + STORED AS CSV + WITH HEADER ROW + LOCATION '/path/to/aggregate_test_100.csv'; diff --git a/docs/source/conf.py b/docs/source/conf.py new file mode 100644 index 0000000..7971239 --- /dev/null +++ b/docs/source/conf.py @@ -0,0 +1,100 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +# import os +# import sys +# sys.path.insert(0, os.path.abspath('.')) + +import datafusion + +# -- Project information ----------------------------------------------------- + +project = 'Arrow Datafusion' +copyright = '2021, Apache Software Foundation' +author = 'Arrow Datafusion Authors' + + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.autodoc', + 'sphinx.ext.autosummary', + 'sphinx.ext.doctest', + 'sphinx.ext.ifconfig', + 'sphinx.ext.mathjax', + 'sphinx.ext.viewcode', + 'sphinx.ext.napoleon', + 'myst_parser', +] + +source_suffix = { + '.rst': 'restructuredtext', + '.md': 'markdown', +} + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = [] + +# Show members for classes in .. autosummary +autodoc_default_options = { + "members": None, + "undoc-members": None, + "show-inheritance": None, + "inherited-members": None, +} + +autosummary_generate = True + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = 'pydata_sphinx_theme' + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] + +html_logo = "_static/images/DataFusion-Logo-Background-White.png" + +html_css_files = ["theme_overrides.css"] + +html_sidebars = { + "**": ["docs-sidebar.html"], +} diff --git a/docs/source/index.rst b/docs/source/index.rst new file mode 100644 index 0000000..eeb89d0 --- /dev/null +++ b/docs/source/index.rst @@ -0,0 +1,65 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +======================= +Apache Arrow Datafusion +======================= + +Table of content +================ + +.. _toc.usage: + +.. toctree:: + :maxdepth: 1 + :caption: Supported Environments + + Rust <https://docs.rs/crate/datafusion/> + Python <python/index> + Command line <cli/index> + +.. _toc.guide: + +.. toctree:: + :maxdepth: 1 + :caption: User Guide + + user-guide/introduction + user-guide/example-usage + user-guide/library + user-guide/cli + user-guide/sql/index + user-guide/distributed/index + user-guide/faq + +.. _toc.specs: + +.. toctree:: + :maxdepth: 1 + :caption: Specification + + specification/invariants + specification/output-field-name-semantic + +.. _toc.readme: + +.. toctree:: + :maxdepth: 1 + :caption: README + + Datafusion <https://github.com/apache/arrow-datafusion/blob/master/README.md> + Ballista <https://github.com/apache/arrow-datafusion/tree/master/ballista/README.md> diff --git a/docs/source/python/api.rst b/docs/source/python/api.rst new file mode 100644 index 0000000..f81753e --- /dev/null +++ b/docs/source/python/api.rst @@ -0,0 +1,30 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _api: + +************* +API Reference +************* + +.. toctree:: + :maxdepth: 2 + + api/dataframe + api/execution_context + api/expression + api/functions diff --git a/docs/source/python/api/dataframe.rst b/docs/source/python/api/dataframe.rst new file mode 100644 index 0000000..0a3c4c8 --- /dev/null +++ b/docs/source/python/api/dataframe.rst @@ -0,0 +1,27 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _api.dataframe: +.. currentmodule:: datafusion + +DataFrame +========= + +.. autosummary:: + :toctree: ../generated/ + + DataFrame diff --git a/docs/source/python/api/execution_context.rst b/docs/source/python/api/execution_context.rst new file mode 100644 index 0000000..7f8c840 --- /dev/null +++ b/docs/source/python/api/execution_context.rst @@ -0,0 +1,27 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _api.execution_context: +.. currentmodule:: datafusion + +ExecutionContext +================ + +.. autosummary:: + :toctree: ../generated/ + + ExecutionContext diff --git a/docs/source/python/api/expression.rst b/docs/source/python/api/expression.rst new file mode 100644 index 0000000..45923fb --- /dev/null +++ b/docs/source/python/api/expression.rst @@ -0,0 +1,27 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _api.expression: +.. currentmodule:: datafusion + +Expression +========== + +.. autosummary:: + :toctree: ../generated/ + + Expression diff --git a/docs/source/python/api/functions.rst b/docs/source/python/api/functions.rst new file mode 100644 index 0000000..6f10d82 --- /dev/null +++ b/docs/source/python/api/functions.rst @@ -0,0 +1,27 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _api.functions: +.. currentmodule:: datafusion + +Functions +========= + +.. autosummary:: + :toctree: ../generated/ + + functions diff --git a/docs/source/python/index.rst b/docs/source/python/index.rst new file mode 100644 index 0000000..56f9097 --- /dev/null +++ b/docs/source/python/index.rst @@ -0,0 +1,192 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +==================== +DataFusion in Python +==================== + +This is a Python library that binds to `Apache Arrow <https://arrow.apache.org/>`_ in-memory query engine `DataFusion <https://github.com/apache/arrow/tree/master/rust/datafusion>`_. + +Like pyspark, it allows you to build a plan through SQL or a DataFrame API against in-memory data, parquet or CSV files, run it in a multi-threaded environment, and obtain the result back in Python. + +It also allows you to use UDFs and UDAFs for complex operations. + +The major advantage of this library over other execution engines is that this library achieves zero-copy between Python and its execution engine: there is no cost in using UDFs, UDAFs, and collecting the results to Python apart from having to lock the GIL when running those operations. + +Its query engine, DataFusion, is written in `Rust <https://www.rust-lang.org>`_), which makes strong assumptions about thread safety and lack of memory leaks. + +Technically, zero-copy is achieved via the `c data interface <https://arrow.apache.org/docs/format/CDataInterface.html>`_. + +How to use it +============= + +Simple usage: + +.. code-block:: python + + import datafusion + import pyarrow + + # an alias + f = datafusion.functions + + # create a context + ctx = datafusion.ExecutionContext() + + # create a RecordBatch and a new DataFrame from it + batch = pyarrow.RecordBatch.from_arrays( + [pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])], + names=["a", "b"], + ) + df = ctx.create_dataframe([[batch]]) + + # create a new statement + df = df.select( + f.col("a") + f.col("b"), + f.col("a") - f.col("b"), + ) + + # execute and collect the first (and only) batch + result = df.collect()[0] + + assert result.column(0) == pyarrow.array([5, 7, 9]) + assert result.column(1) == pyarrow.array([-3, -3, -3]) + + +UDFs +---- + +.. code-block:: python + + def is_null(array: pyarrow.Array) -> pyarrow.Array: + return array.is_null() + + udf = f.udf(is_null, [pyarrow.int64()], pyarrow.bool_()) + + df = df.select(udf(f.col("a"))) + + +UDAF +---- + +.. code-block:: python + + import pyarrow + import pyarrow.compute + + + class Accumulator: + """ + Interface of a user-defined accumulation. + """ + def __init__(self): + self._sum = pyarrow.scalar(0.0) + + def to_scalars(self) -> [pyarrow.Scalar]: + return [self._sum] + + def update(self, values: pyarrow.Array) -> None: + # not nice since pyarrow scalars can't be summed yet. This breaks on `None` + self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(values).as_py()) + + def merge(self, states: pyarrow.Array) -> None: + # not nice since pyarrow scalars can't be summed yet. This breaks on `None` + self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(states).as_py()) + + def evaluate(self) -> pyarrow.Scalar: + return self._sum + + + df = ... + + udaf = f.udaf(Accumulator, pyarrow.float64(), pyarrow.float64(), [pyarrow.float64()]) + + df = df.aggregate( + [], + [udaf(f.col("a"))] + ) + + +How to install (from pip) +========================= + +.. code-block:: shell + + pip install datafusion + + +How to develop +============== + +This assumes that you have rust and cargo installed. We use the workflow recommended by `pyo3 <https://github.com/PyO3/pyo3>`_ and `maturin <https://github.com/PyO3/maturin>`_. + +Bootstrap: + +.. code-block:: shell + + # fetch this repo + git clone g...@github.com:apache/arrow-datafusion.git + + cd arrow-datafusion/python + + # prepare development environment (used to build wheel / install in development) + python3 -m venv venv + # activate the venv + source venv/bin/activate + pip install -r requirements.txt + + +Whenever rust code changes (your changes or via `git pull`): + +.. code-block:: shell + + # make sure you activate the venv using "source venv/bin/activate" first + maturin develop + python -m pytest + + +How to update dependencies +========================== + +To change test dependencies, change the `requirements.in` and run + +.. code-block:: shell + + # install pip-tools (this can be done only once), also consider running in venv + pip install pip-tools + + # change requirements.in and then run + pip-compile --generate-hashes + + +To update dependencies, run + +.. code-block:: shell + + pip-compile update + + +More details about pip-tools `here <https://github.com/jazzband/pip-tools>`_ + + +API reference +============= + +.. toctree:: + :maxdepth: 2 + + api diff --git a/docs/specification/invariants.md b/docs/source/specification/invariants.md similarity index 100% rename from docs/specification/invariants.md rename to docs/source/specification/invariants.md diff --git a/docs/specification/output-field-name-semantic.md b/docs/source/specification/output-field-name-semantic.md similarity index 100% rename from docs/specification/output-field-name-semantic.md rename to docs/source/specification/output-field-name-semantic.md diff --git a/docs/user-guide/src/cli.md b/docs/source/user-guide/cli.md similarity index 97% rename from docs/user-guide/src/cli.md rename to docs/source/user-guide/cli.md index 28716b6..cb95fba 100644 --- a/docs/user-guide/src/cli.md +++ b/docs/source/user-guide/cli.md @@ -22,7 +22,7 @@ The DataFusion CLI allows SQL queries to be executed by an in-process DataFusion context, or by a distributed Ballista context. -```ignore +``` USAGE: datafusion-cli [FLAGS] [OPTIONS] @@ -44,11 +44,11 @@ OPTIONS: Create a CSV file to query. -```bash,ignore +```bash $ echo "1,2" > data.csv ``` -```sql,ignore +```bash $ datafusion-cli DataFusion CLI v5.1.0-SNAPSHOT @@ -69,6 +69,6 @@ DataFusion CLI v5.1.0-SNAPSHOT The DataFusion CLI can also connect to a Ballista scheduler for query execution. -```bash,ignore +```bash datafusion-cli --host localhost --port 50050 ``` diff --git a/docs/source/user-guide/distributed/clients/index.rst b/docs/source/user-guide/distributed/clients/index.rst new file mode 100644 index 0000000..c9eb1e1 --- /dev/null +++ b/docs/source/user-guide/distributed/clients/index.rst @@ -0,0 +1,25 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +Clients +======= + +.. toctree:: + :maxdepth: 2 + + rust + python diff --git a/docs/user-guide/src/distributed/client-python.md b/docs/source/user-guide/distributed/clients/python.md similarity index 100% rename from docs/user-guide/src/distributed/client-python.md rename to docs/source/user-guide/distributed/clients/python.md diff --git a/docs/user-guide/src/distributed/client-rust.md b/docs/source/user-guide/distributed/clients/rust.md similarity index 99% rename from docs/user-guide/src/distributed/client-rust.md rename to docs/source/user-guide/distributed/clients/rust.md index 4e6ecf5..ccf19aa 100644 --- a/docs/user-guide/src/distributed/client-rust.md +++ b/docs/source/user-guide/distributed/clients/rust.md @@ -17,7 +17,7 @@ under the License. --> -## Ballista Rust Client +# Ballista Rust Client Ballista usage is very similar to DataFusion. Tha main difference is that the starting point is a `BallistaContext` instead of the DataFusion `ExecutionContext`. Ballista uses the same DataFrame API as DataFusion. diff --git a/docs/user-guide/src/distributed/cargo-install.md b/docs/source/user-guide/distributed/deployment/cargo-install.md similarity index 96% rename from docs/user-guide/src/distributed/cargo-install.md rename to docs/source/user-guide/distributed/deployment/cargo-install.md index 504154d..22a38d7 100644 --- a/docs/user-guide/src/distributed/cargo-install.md +++ b/docs/source/user-guide/distributed/deployment/cargo-install.md @@ -17,7 +17,7 @@ under the License. --> -## Deploying a standalone Ballista cluster using cargo install +# Deploying a standalone Ballista cluster using cargo install A simple way to start a local cluster for testing purposes is to use cargo to install the scheduler and executor crates. diff --git a/docs/user-guide/src/distributed/configuration.md b/docs/source/user-guide/distributed/deployment/configuration.md similarity index 100% rename from docs/user-guide/src/distributed/configuration.md rename to docs/source/user-guide/distributed/deployment/configuration.md diff --git a/docs/user-guide/src/distributed/docker-compose.md b/docs/source/user-guide/distributed/deployment/docker-compose.md similarity index 100% rename from docs/user-guide/src/distributed/docker-compose.md rename to docs/source/user-guide/distributed/deployment/docker-compose.md diff --git a/docs/user-guide/src/distributed/docker.md b/docs/source/user-guide/distributed/deployment/docker.md similarity index 98% rename from docs/user-guide/src/distributed/docker.md rename to docs/source/user-guide/distributed/deployment/docker.md index 4892ab8..541a884 100644 --- a/docs/user-guide/src/distributed/docker.md +++ b/docs/source/user-guide/distributed/deployment/docker.md @@ -17,7 +17,7 @@ under the License. --> -## Starting a Ballista cluster using Docker +# Starting a Ballista cluster using Docker ## Build Docker image diff --git a/docs/source/user-guide/distributed/deployment/index.rst b/docs/source/user-guide/distributed/deployment/index.rst new file mode 100644 index 0000000..f5e41d0 --- /dev/null +++ b/docs/source/user-guide/distributed/deployment/index.rst @@ -0,0 +1,29 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +Start a Ballista Cluster +======================== + +.. toctree:: + :maxdepth: 2 + + cargo-install + docker + docker-compose + kubernetes + raspberrypi + configuration diff --git a/docs/user-guide/src/distributed/kubernetes.md b/docs/source/user-guide/distributed/deployment/kubernetes.md similarity index 100% rename from docs/user-guide/src/distributed/kubernetes.md rename to docs/source/user-guide/distributed/deployment/kubernetes.md diff --git a/docs/user-guide/src/distributed/raspberrypi.md b/docs/source/user-guide/distributed/deployment/raspberrypi.md similarity index 100% rename from docs/user-guide/src/distributed/raspberrypi.md rename to docs/source/user-guide/distributed/deployment/raspberrypi.md diff --git a/docs/source/user-guide/distributed/index.rst b/docs/source/user-guide/distributed/index.rst new file mode 100644 index 0000000..abb3c7b --- /dev/null +++ b/docs/source/user-guide/distributed/index.rst @@ -0,0 +1,26 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +Ballista Distributed Compute +============================ + +.. toctree:: + :maxdepth: 2 + + introduction + deployment/index + clients/index diff --git a/docs/user-guide/src/distributed/introduction.md b/docs/source/user-guide/distributed/introduction.md similarity index 99% rename from docs/user-guide/src/distributed/introduction.md rename to docs/source/user-guide/distributed/introduction.md index aebf700..77db626 100644 --- a/docs/user-guide/src/distributed/introduction.md +++ b/docs/source/user-guide/distributed/introduction.md @@ -17,7 +17,7 @@ under the License. --> -## Overview +# Overview Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built on an architecture that allows other programming languages to be supported as first-class citizens without paying diff --git a/docs/user-guide/src/example-usage.md b/docs/source/user-guide/example-usage.md similarity index 100% rename from docs/user-guide/src/example-usage.md rename to docs/source/user-guide/example-usage.md diff --git a/docs/user-guide/src/faq.md b/docs/source/user-guide/faq.md similarity index 100% rename from docs/user-guide/src/faq.md rename to docs/source/user-guide/faq.md diff --git a/docs/user-guide/src/introduction.md b/docs/source/user-guide/introduction.md similarity index 99% rename from docs/user-guide/src/introduction.md rename to docs/source/user-guide/introduction.md index 7ba3c96..e165040 100644 --- a/docs/user-guide/src/introduction.md +++ b/docs/source/user-guide/introduction.md @@ -17,7 +17,7 @@ under the License. --> -# DataFusion +# Introduction DataFusion is an extensible query execution framework, written in Rust, that uses [Apache Arrow](https://arrow.apache.org) as its diff --git a/docs/user-guide/src/library.md b/docs/source/user-guide/library.md similarity index 96% rename from docs/user-guide/src/library.md rename to docs/source/user-guide/library.md index 1a1bbfb..bfaf741 100644 --- a/docs/user-guide/src/library.md +++ b/docs/source/user-guide/library.md @@ -58,4 +58,6 @@ Finally, in order to build with the `simd` optimization `cargo nightly` is requi set architecture you are building on you will want to configure the `target-cpu` as well, ideally with `native` or at least `avx2`. -`RUSTFLAGS='-C target-cpu=native' cargo +nightly run --release` +``` +RUSTFLAGS='-C target-cpu=native' cargo +nightly run --release +``` diff --git a/docs/user-guide/src/sql/datafusion-functions.md b/docs/source/user-guide/sql/datafusion-functions.md similarity index 100% rename from docs/user-guide/src/sql/datafusion-functions.md rename to docs/source/user-guide/sql/datafusion-functions.md diff --git a/docs/user-guide/src/sql/ddl.md b/docs/source/user-guide/sql/ddl.md similarity index 100% rename from docs/user-guide/src/sql/ddl.md rename to docs/source/user-guide/sql/ddl.md diff --git a/docs/source/user-guide/sql/index.rst b/docs/source/user-guide/sql/index.rst new file mode 100644 index 0000000..2489f6b --- /dev/null +++ b/docs/source/user-guide/sql/index.rst @@ -0,0 +1,26 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +SQL Reference +============= + +.. toctree:: + :maxdepth: 2 + + select + ddl + DataFusion Functions <datafusion-functions> diff --git a/docs/user-guide/src/sql/select.md b/docs/source/user-guide/sql/select.md similarity index 94% rename from docs/user-guide/src/sql/select.md rename to docs/source/user-guide/sql/select.md index 348ffff..49399c9 100644 --- a/docs/user-guide/src/sql/select.md +++ b/docs/source/user-guide/sql/select.md @@ -37,7 +37,7 @@ DataFusion supports the following syntax for queries: </code> -# WITH clause +## WITH clause A with clause allows to give names for queries and reference them by name. @@ -46,7 +46,7 @@ WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a) SELECT a, b FROM x; ``` -# SELECT clause +## SELECT clause Example: @@ -61,7 +61,7 @@ By default `ALL` will be used, which returns all the rows. SELECT DISTINCT person, age FROM employees ``` -# FROM clause +## FROM clause Example: @@ -69,7 +69,7 @@ Example: SELECT t.a FROM table AS t ``` -# WHERE clause +## WHERE clause Example: @@ -77,7 +77,7 @@ Example: SELECT a FROM table WHERE a > 10 ``` -# GROUP BY clause +## GROUP BY clause Example: @@ -85,7 +85,7 @@ Example: SELECT a, b, MAX(c) FROM table GROUP BY a, b ``` -# HAVING clause +## HAVING clause Example: @@ -93,7 +93,7 @@ Example: SELECT a, b, MAX(c) FROM table GROUP BY a, b HAVING MAX(c) > 10 ``` -# UNION clause +## UNION clause Example: @@ -111,7 +111,7 @@ SELECT FROM table2 ``` -# ORDER BY clause +## ORDER BY clause Orders the results by the referenced expression. By default it uses ascending order (`ASC`). This order can be changed to descending by adding `DESC` after the order-by expressions. @@ -124,7 +124,7 @@ SELECT age, person FROM table ORDER BY age DESC; SELECT age, person FROM table ORDER BY age, person DESC; ``` -# LIMIT clause +## LIMIT clause Limits the number of rows to be a maximum of `count` rows. `count` should be a non-negative integer. diff --git a/docs/user-guide/.gitignore b/docs/user-guide/.gitignore deleted file mode 100644 index e9c0728..0000000 --- a/docs/user-guide/.gitignore +++ /dev/null @@ -1 +0,0 @@ -book \ No newline at end of file diff --git a/docs/user-guide/src/SUMMARY.md b/docs/user-guide/src/SUMMARY.md deleted file mode 100644 index 3621031..0000000 --- a/docs/user-guide/src/SUMMARY.md +++ /dev/null @@ -1,43 +0,0 @@ -<!--- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. ---> - -# Summary - -- [Introduction](introduction.md) -- [Example Usage](example-usage.md) -- [Use as a Library](library.md) -- [DataFusion CLI](cli.md) -- [SQL Reference](sql/introduction.md) - - - [SELECT](sql/select.md) - - [DDL](sql/ddl.md) - - [Datafusion Specific Functions](sql/datafusion-functions.md) - -- [Ballista Distributed Compute](distributed/introduction.md) - - [Start a Ballista Cluster](distributed/deployment.md) - - [Cargo Install](distributed/cargo-install.md) - - [Docker](distributed/docker.md) - - [Docker Compose](distributed/docker-compose.md) - - [Kubernetes](distributed/kubernetes.md) - - [Raspberry Pi](distributed/raspberrypi.md) - - [Ballista Configuration](distributed/configuration.md) - - [Clients](distributed/clients.md) - - [Rust](distributed/client-rust.md) - - [Python](distributed/client-python.md) -- [Frequently Asked Questions](faq.md) diff --git a/docs/user-guide/src/distributed/clients.md b/docs/user-guide/src/distributed/clients.md deleted file mode 100644 index 7b69f19..0000000 --- a/docs/user-guide/src/distributed/clients.md +++ /dev/null @@ -1,23 +0,0 @@ -<!--- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. ---> - -## Clients - -- [Rust](client-rust.md) -- [Python](client-python.md) diff --git a/docs/user-guide/src/distributed/deployment.md b/docs/user-guide/src/distributed/deployment.md deleted file mode 100644 index fee020c..0000000 --- a/docs/user-guide/src/distributed/deployment.md +++ /dev/null @@ -1,28 +0,0 @@ -<!--- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. ---> - -# Deployment - -There are multiple ways that a Ballista cluster can be deployed. - -- [Create a cluster using Cargo install](cargo-install.md) -- [Create a cluster using Docker](docker.md) -- [Create a cluster using Docker Compose](docker-compose.md) -- [Create a cluster using Kubernetes](kubernetes.md) -- [Create a cluster on Raspberry Pi](raspberrypi.md) diff --git a/docs/user-guide/src/sql/introduction.md b/docs/user-guide/src/sql/introduction.md deleted file mode 100644 index 89ed277..0000000 --- a/docs/user-guide/src/sql/introduction.md +++ /dev/null @@ -1,20 +0,0 @@ -<!--- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. ---> - -# SQL Reference