This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new fdc88b5bbac DataFusion 17.0.0 docs (#309)
fdc88b5bbac is described below
commit fdc88b5bbace748c353c11d6431bba9dcad5244d
Author: Andy Grove <[email protected]>
AuthorDate: Fri Feb 3 16:14:47 2023 -0700
DataFusion 17.0.0 docs (#309)
---
datafusion/contributor-guide/index.html | 71 ------------------------
datafusion/objects.inv | Bin 5049 -> 5041 bytes
datafusion/searchindex.js | 2 +-
datafusion/user-guide/configs.html | 2 +-
datafusion/user-guide/dataframe.html | 17 +++---
datafusion/user-guide/introduction.html | 10 ++--
datafusion/user-guide/sql/data_types.html | 30 ++++++----
datafusion/user-guide/sql/scalar_functions.html | 21 +++++++
8 files changed, 58 insertions(+), 95 deletions(-)
diff --git a/datafusion/contributor-guide/index.html
b/datafusion/contributor-guide/index.html
index 66e52e37f5b..7e24d81c5ff 100644
--- a/datafusion/contributor-guide/index.html
+++ b/datafusion/contributor-guide/index.html
@@ -319,28 +319,6 @@
sqllogictests Tests
</a>
</li>
- <li class="toc-h3 nav-item toc-entry">
- <a class="reference internal nav-link"
href="#sql-postgres-integration-tests">
- SQL / Postgres Integration Tests
- </a>
- <ul class="nav section-nav flex-column">
- <li class="toc-h4 nav-item toc-entry">
- <a class="reference internal nav-link" href="#setup-environment">
- setup environment
- </a>
- </li>
- <li class="toc-h4 nav-item toc-entry">
- <a class="reference internal nav-link" href="#install-dependencies">
- Install dependencies
- </a>
- </li>
- <li class="toc-h4 nav-item toc-entry">
- <a class="reference internal nav-link" href="#invoke-the-test-runner">
- Invoke the test runner
- </a>
- </li>
- </ul>
- </li>
</ul>
</li>
<li class="toc-h2 nav-item toc-entry">
@@ -556,55 +534,6 @@ and tries to follow <a class="reference external"
href="https://doc.rust-lang.or
<p>Data Driven tests have many benefits including being easier to write and
maintain. We are in the process of <a class="reference external"
href="https://github.com/apache/arrow-datafusion/issues/4460">migrating
sql_integration tests</a> and encourage
you to add new tests using sqllogictests if possible.</p>
</section>
-<section id="sql-postgres-integration-tests">
-<h3>SQL / Postgres Integration Tests<a class="headerlink"
href="#sql-postgres-integration-tests" title="Permalink to this
heading">¶</a></h3>
-<p>The <a class="reference external"
href="https://github.com/apache/arrow-datafusion/blob/master/integration-tests">integration-tests</a>
directory contains a harness that runs certain queries against both postgres
and datafusion and compares results</p>
-<section id="setup-environment">
-<h4>setup environment<a class="headerlink" href="#setup-environment"
title="Permalink to this heading">¶</a></h4>
-<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span><span class="nb">export</span> <span
class="nv">POSTGRES_DB</span><span class="o">=</span>postgres
-<span class="nb">export</span> <span class="nv">POSTGRES_USER</span><span
class="o">=</span>postgres
-<span class="nb">export</span> <span class="nv">POSTGRES_HOST</span><span
class="o">=</span>localhost
-<span class="nb">export</span> <span class="nv">POSTGRES_PORT</span><span
class="o">=</span><span class="m">5432</span>
-</pre></div>
-</div>
-</section>
-<section id="install-dependencies">
-<h4>Install dependencies<a class="headerlink" href="#install-dependencies"
title="Permalink to this heading">¶</a></h4>
-<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span><span class="c1"># Install
dependencies</span>
-python -m pip install --upgrade pip setuptools wheel
-python -m pip install -r integration-tests/requirements.txt
-
-<span class="c1"># setup environment</span>
-<span class="nv">POSTGRES_DB</span><span class="o">=</span>postgres <span
class="nv">POSTGRES_USER</span><span class="o">=</span>postgres <span
class="nv">POSTGRES_HOST</span><span class="o">=</span>localhost <span
class="nv">POSTGRES_PORT</span><span class="o">=</span><span
class="m">5432</span> python -m pytest -v integration-tests/test_psql_parity.py
-
-<span class="c1"># Create</span>
-psql -d <span class="s2">"</span><span
class="nv">$POSTGRES_DB</span><span class="s2">"</span> -h <span
class="s2">"</span><span class="nv">$POSTGRES_HOST</span><span
class="s2">"</span> -p <span class="s2">"</span><span
class="nv">$POSTGRES_PORT</span><span class="s2">"</span> -U <span
class="s2">"</span><span class="nv">$POSTGRES_USER</span><span
class="s2">"</span> -c <span class="s1">'CREATE TABLE IF NOT EXISTS
test (</span>
-<span class="s1"> c1 character varying NOT NULL,</span>
-<span class="s1"> c2 integer NOT NULL,</span>
-<span class="s1"> c3 smallint NOT NULL,</span>
-<span class="s1"> c4 smallint NOT NULL,</span>
-<span class="s1"> c5 integer NOT NULL,</span>
-<span class="s1"> c6 bigint NOT NULL,</span>
-<span class="s1"> c7 smallint NOT NULL,</span>
-<span class="s1"> c8 integer NOT NULL,</span>
-<span class="s1"> c9 bigint NOT NULL,</span>
-<span class="s1"> c10 character varying NOT NULL,</span>
-<span class="s1"> c11 double precision NOT NULL,</span>
-<span class="s1"> c12 double precision NOT NULL,</span>
-<span class="s1"> c13 character varying NOT NULL</span>
-<span class="s1">);'</span>
-
-psql -d <span class="s2">"</span><span
class="nv">$POSTGRES_DB</span><span class="s2">"</span> -h <span
class="s2">"</span><span class="nv">$POSTGRES_HOST</span><span
class="s2">"</span> -p <span class="s2">"</span><span
class="nv">$POSTGRES_PORT</span><span class="s2">"</span> -U <span
class="s2">"</span><span class="nv">$POSTGRES_USER</span><span
class="s2">"</span> -c <span class="s2">"\copy test FROM
'</span><span class="k">$(</span><s [...]
-</pre></div>
-</div>
-</section>
-<section id="invoke-the-test-runner">
-<h4>Invoke the test runner<a class="headerlink" href="#invoke-the-test-runner"
title="Permalink to this heading">¶</a></h4>
-<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span>python -m pytest -v
integration-tests/test_psql_parity.py
-</pre></div>
-</div>
-</section>
-</section>
</section>
<section id="benchmarks">
<h2>Benchmarks<a class="headerlink" href="#benchmarks" title="Permalink to
this heading">¶</a></h2>
diff --git a/datafusion/objects.inv b/datafusion/objects.inv
index 0b79f2c92d9..6afe2f127ea 100644
Binary files a/datafusion/objects.inv and b/datafusion/objects.inv differ
diff --git a/datafusion/searchindex.js b/datafusion/searchindex.js
index 0323e771313..84ecfff9ed2 100644
--- a/datafusion/searchindex.js
+++ b/datafusion/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributor-guide/communication",
"contributor-guide/index", "contributor-guide/quarterly_roadmap",
"contributor-guide/roadmap", "contributor-guide/specification/index",
"contributor-guide/specification/invariants",
"contributor-guide/specification/output-field-name-semantic", "index",
"user-guide/cli", "user-guide/configs", "user-guide/dataframe",
"user-guide/example-usage", "user-guide/expressions", "user-guide/faq",
"user-guide/introduction", "user-guide [...]
\ No newline at end of file
+Search.setIndex({"docnames": ["contributor-guide/communication",
"contributor-guide/index", "contributor-guide/quarterly_roadmap",
"contributor-guide/roadmap", "contributor-guide/specification/index",
"contributor-guide/specification/invariants",
"contributor-guide/specification/output-field-name-semantic", "index",
"user-guide/cli", "user-guide/configs", "user-guide/dataframe",
"user-guide/example-usage", "user-guide/expressions", "user-guide/faq",
"user-guide/introduction", "user-guide [...]
\ No newline at end of file
diff --git a/datafusion/user-guide/configs.html
b/datafusion/user-guide/configs.html
index b340d102208..9a36ed979fe 100644
--- a/datafusion/user-guide/configs.html
+++ b/datafusion/user-guide/configs.html
@@ -328,7 +328,7 @@ Environment variables are read during <code class="docutils
literal notranslate"
<tbody>
<tr
class="row-even"><td><p>datafusion.catalog.create_default_catalog_and_schema</p></td>
<td><p>true</p></td>
-<td><p>Number of partitions for query execution. Increasing partitions can
increase concurrency. Defaults to the number of cpu cores on the
system.</p></td>
+<td><p>Whether the default catalog and schema should be created
automatically.</p></td>
</tr>
<tr class="row-odd"><td><p>datafusion.catalog.default_catalog</p></td>
<td><p>datafusion</p></td>
diff --git a/datafusion/user-guide/dataframe.html
b/datafusion/user-guide/dataframe.html
index 8ac113dc540..39af087c84e 100644
--- a/datafusion/user-guide/dataframe.html
+++ b/datafusion/user-guide/dataframe.html
@@ -426,25 +426,28 @@ execution. The plan is evaluated (executed) when an
action method is invoked, su
<tr class="row-odd"><td><p>collect_partitioned</p></td>
<td><p>Executes this DataFrame and collects all results into a vector of
vector of RecordBatch maintaining the input partitioning.</p></td>
</tr>
-<tr class="row-even"><td><p>execute_stream</p></td>
+<tr class="row-even"><td><p>count</p></td>
+<td><p>Executes this DataFrame to get the total number of rows.</p></td>
+</tr>
+<tr class="row-odd"><td><p>execute_stream</p></td>
<td><p>Executes this DataFrame and returns a stream over a single
partition.</p></td>
</tr>
-<tr class="row-odd"><td><p>execute_stream_partitioned</p></td>
+<tr class="row-even"><td><p>execute_stream_partitioned</p></td>
<td><p>Executes this DataFrame and returns one stream per partition.</p></td>
</tr>
-<tr class="row-even"><td><p>show</p></td>
+<tr class="row-odd"><td><p>show</p></td>
<td><p>Execute this DataFrame and print the results to stdout.</p></td>
</tr>
-<tr class="row-odd"><td><p>show_limit</p></td>
+<tr class="row-even"><td><p>show_limit</p></td>
<td><p>Execute this DataFrame and print a subset of results to stdout.</p></td>
</tr>
-<tr class="row-even"><td><p>write_csv</p></td>
+<tr class="row-odd"><td><p>write_csv</p></td>
<td><p>Execute this DataFrame and write the results to disk in CSV
format.</p></td>
</tr>
-<tr class="row-odd"><td><p>write_json</p></td>
+<tr class="row-even"><td><p>write_json</p></td>
<td><p>Execute this DataFrame and write the results to disk in JSON
format.</p></td>
</tr>
-<tr class="row-even"><td><p>write_parquet</p></td>
+<tr class="row-odd"><td><p>write_parquet</p></td>
<td><p>Execute this DataFrame and write the results to disk in Parquet
format.</p></td>
</tr>
</tbody>
diff --git a/datafusion/user-guide/introduction.html
b/datafusion/user-guide/introduction.html
index e07772131aa..eacb0d9f2dc 100644
--- a/datafusion/user-guide/introduction.html
+++ b/datafusion/user-guide/introduction.html
@@ -323,10 +323,10 @@
<p>DataFusion is an extensible query execution framework, written in
Rust, that uses <a class="reference external"
href="https://arrow.apache.org">Apache Arrow</a> as its
in-memory format.</p>
-<p>DataFusion supports both an SQL and a DataFrame API for building
-logical query plans as well as a query optimizer and execution engine
-capable of parallel execution against partitioned data sources (CSV
-and Parquet) using threads.</p>
+<p>DataFusion supports SQL and a DataFrame API for building logical query
+plans, an extensive query optimizer, and a multi-threaded parallel
+execution execution engine for processing partitioned data sources
+such as CSV and Parquet files extremely quickly.</p>
<section id="use-cases">
<h2>Use Cases<a class="headerlink" href="#use-cases" title="Permalink to this
heading">¶</a></h2>
<p>DataFusion is used to create modern, fast and efficient data
@@ -337,7 +337,7 @@ the convenience of an SQL interface or a DataFrame API.</p>
<section id="why-datafusion">
<h2>Why DataFusion?<a class="headerlink" href="#why-datafusion"
title="Permalink to this heading">¶</a></h2>
<ul class="simple">
-<li><p><em>High Performance</em>: Leveraging Rust and Arrow’s memory model,
DataFusion achieves very high performance</p></li>
+<li><p><em>High Performance</em>: Leveraging Rust and Arrow’s memory model,
DataFusion is very fast.</p></li>
<li><p><em>Easy to Connect</em>: Being part of the Apache Arrow ecosystem
(Arrow, Parquet and Flight), DataFusion works well with the rest of the big
data ecosystem</p></li>
<li><p><em>Easy to Embed</em>: Allowing extension at almost any point in its
design, DataFusion can be tailored for your specific usecase</p></li>
<li><p><em>High Quality</em>: Extensively tested, both by itself and with the
rest of the Arrow ecosystem, DataFusion can be used as the foundation for
production systems.</p></li>
diff --git a/datafusion/user-guide/sql/data_types.html
b/datafusion/user-guide/sql/data_types.html
index 0e93829c5a3..acfead9679e 100644
--- a/datafusion/user-guide/sql/data_types.html
+++ b/datafusion/user-guide/sql/data_types.html
@@ -345,6 +345,16 @@ execution. The SQL types from
<a class="reference external"
href="https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/data_type.rs#L27">sqlparser-rs</a>
are mapped to <a class="reference external"
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html">Arrow
data types</a> according to the following table.
This mapping occurs when defining the schema in a <code class="docutils
literal notranslate"><span class="pre">CREATE</span> <span
class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> command or
when performing a SQL <code class="docutils literal notranslate"><span
class="pre">CAST</span></code> operation.</p>
+<p>You can see the corresponding Arrow type for any SQL expression using
+the <code class="docutils literal notranslate"><span
class="pre">arrow_typeof</span></code> function. For example:</p>
+<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span>❯ select arrow_typeof(interval '1
month');
++-------------------------------------+
+| arrowtypeof(IntervalYearMonth("1")) |
++-------------------------------------+
+| Interval(YearMonth) |
++-------------------------------------+
+</pre></div>
+</div>
<section id="character-types">
<h2>Character Types<a class="headerlink" href="#character-types"
title="Permalink to this heading">¶</a></h2>
<table class="table">
@@ -363,6 +373,9 @@ This mapping occurs when defining the schema in a <code
class="docutils literal
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">TEXT</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span
class="pre">Utf8</span></code></p></td>
</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">STRING</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">Utf8</span></code></p></td>
+</tr>
</tbody>
</table>
</section>
@@ -445,6 +458,9 @@ This mapping occurs when defining the schema in a <code
class="docutils literal
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">TIMESTAMP</span></code></p></td>
<td class="text-left"><p><code class="docutils literal notranslate"><span
class="pre">Timestamp(TimeUnit::Nanosecond,</span> <span
class="pre">None)</span></code></p></td>
</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">INTERVAL</span></code></p></td>
+<td class="text-left"><p><code class="docutils literal notranslate"><span
class="pre">Interval(IntervalUnit::YearMonth)</span></code> or <code
class="docutils literal notranslate"><span
class="pre">Interval(IntervalUnit::DayTime)</span></code></p></td>
+</tr>
</tbody>
</table>
</section>
@@ -508,22 +524,16 @@ This mapping occurs when defining the schema in a <code
class="docutils literal
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">NVARCHAR</span></code></p></td>
<td class="text-left"><p><em>Not yet supported</em></p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">STRING</span></code></p></td>
-<td class="text-left"><p><em>Not yet supported</em></p></td>
-</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CUSTOM</span></code></p></td>
-<td class="text-left"><p><em>Not yet supported</em></p></td>
-</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">ARRAY</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">CUSTOM</span></code></p></td>
<td class="text-left"><p><em>Not yet supported</em></p></td>
</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">ENUM</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">ARRAY</span></code></p></td>
<td class="text-left"><p><em>Not yet supported</em></p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">SET</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">ENUM</span></code></p></td>
<td class="text-left"><p><em>Not yet supported</em></p></td>
</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">INTERVAL</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">SET</span></code></p></td>
<td class="text-left"><p><em>Not yet supported</em></p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">DATETIME</span></code></p></td>
diff --git a/datafusion/user-guide/sql/scalar_functions.html
b/datafusion/user-guide/sql/scalar_functions.html
index 366208a5241..e7d52208925 100644
--- a/datafusion/user-guide/sql/scalar_functions.html
+++ b/datafusion/user-guide/sql/scalar_functions.html
@@ -876,6 +876,15 @@
</code>
</a>
</li>
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#arrow-typeof">
+ <code class="docutils literal notranslate">
+ <span class="pre">
+ arrow_typeof
+ </span>
+ </code>
+ </a>
+ </li>
<li class="toc-h3 nav-item toc-entry">
<a class="reference internal nav-link" href="#in-list">
<code class="docutils literal notranslate">
@@ -1320,6 +1329,18 @@ wherever it appears in the statement, using a value
chosen at planning time.</p>
<section id="array">
<h3><code class="docutils literal notranslate"><span
class="pre">array</span></code><a class="headerlink" href="#array"
title="Permalink to this heading">¶</a></h3>
</section>
+<section id="arrow-typeof">
+<h3><code class="docutils literal notranslate"><span
class="pre">arrow_typeof</span></code><a class="headerlink"
href="#arrow-typeof" title="Permalink to this heading">¶</a></h3>
+<p>Returns the underlying Arrow type of the the expression:</p>
+<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span>❯ select arrow_typeof(4 + 4.3);
++--------------------------------------+
+| arrowtypeof(Int64(4) + Float64(4.3)) |
++--------------------------------------+
+| Float64 |
++--------------------------------------+
+</pre></div>
+</div>
+</section>
<section id="in-list">
<h3><code class="docutils literal notranslate"><span
class="pre">in_list</span></code><a class="headerlink" href="#in-list"
title="Permalink to this heading">¶</a></h3>
</section>