This is an automated email from the ASF dual-hosted git repository.
bridgetb pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/drill-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new d70c850 team update-DRILL-6744 edits
d70c850 is described below
commit d70c850599f5cc6401d8a98ae84d75e0c6635ed6
Author: Bridget Bevens <[email protected]>
AuthorDate: Fri Dec 14 14:01:44 2018 -0800
team update-DRILL-6744 edits
---
docs/parquet-filter-pushdown/index.html | 85 ++++++++++++---
feed.xml | 4 +-
team/index.html | 188 ++++++++++++++++++++------------
3 files changed, 190 insertions(+), 87 deletions(-)
diff --git a/docs/parquet-filter-pushdown/index.html
b/docs/parquet-filter-pushdown/index.html
index 9465889..85ec38e 100644
--- a/docs/parquet-filter-pushdown/index.html
+++ b/docs/parquet-filter-pushdown/index.html
@@ -1268,7 +1268,7 @@
</div>
- Sep 28, 2018
+ Dec 14, 2018
<link href="/css/docpage.css" rel="stylesheet" type="text/css">
@@ -1276,9 +1276,10 @@
<p>Drill 1.9 introduces the Parquet filter pushdown option. Parquet
filter pushdown is a performance optimization that prunes extraneous data from
a Parquet file to reduce the amount of data that Drill scans and reads when a
query on a Parquet file contains a filter expression. Pruning data reduces the
I/O, CPU, and network overhead to optimize Drill’s performance.</p>
-<p>Parquet filter pushdown is enabled by default. When a query contains a
filter expression, you can run the <a href="/docs/explain/">EXPLAIN PLAN
command</a> to see if Drill applies Parquet filter pushdown to the query. You
can enable and disable this feature using the <a
href="/docs/alter-system/">ALTER SYSTEM|SESSION SET</a> command with the
<code>planner.store.parquet.rowgroup.filter.pushdown</code> option. </p>
-
-<p>As of Drill 1.13, the query planner in Drill can apply project push down,
filter push down, and partition pruning to star queries in common table
expressions (CTEs), views, and subqueries, for example: </p>
+<p>Parquet filter pushdown is enabled by default. When a query contains a
filter expression, you can run the <a href="/docs/explain/">EXPLAIN PLAN
command</a> to see if Drill applies Parquet filter pushdown to the query. You
can enable and disable this feature through the
<code>planner.store.parquet.rowgroup.filter.pushdown</code> option, as shown:
</p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SET
`planner.store.parquet.rowgroup.filter.pushdown`='false'
+</code></pre></div>
+<p>Starting in Drill 1.13, the query planner in Drill can apply project push
down, filter push down, and partition pruning to star queries in common table
expressions (CTEs), views, and subqueries, for example: </p>
<div class="highlight"><pre><code class="language-text" data-lang="text">
select col1 from (select * from t)
</code></pre></div>
<p>When a CTE, view, or subquery contains a star filter condition, the query
planner in Drill can apply the filter and prune extraneous data, further
reducing the amount of data that the scanner reads and improving performance.
</p>
@@ -1293,15 +1294,64 @@
<p>The query planner looks at the minimum and maximum values in each row group
for an intersection. If no intersection exists, the planner can prune the row
group in the table. If the minimum and maximum value range is too large, Drill
does not apply Parquet filter pushdown. The query planner can typically prune
more data when the tables in the Parquet file are sorted by row groups. </p>
+<h2 id="parquet-filter-pushdown-for-varchar-and-decimal-data-types">Parquet
Filter Pushdown for VARCHAR and DECIMAL Data Types</h2>
+
+<p>Starting in Drill 1.15, Drill supports Parquet filter pushdown for the
VARCHAR and DECIMAL data types. Drill uses binary statistics in the Parquet
file or Drill metadata file to push filters on VARCHAR and DECIMAL data types
down to the data source. </p>
+
+<h3 id="parquet-generated-files">Parquet Generated Files</h3>
+
+<p>By default, Parquet filter pushdown works for VARCHAR and DECIMAL data
types if the Parquet files were created with Parquet version 1.10.0 or later.
Drill 1.13 and later uses Parquet 1.10.0 to write and read back Parquet files.
</p>
+
+<p>If Parquet files were created with a pre-1.10.0 version of Parquet, and the
data in the binary columns is in ASCII format (not UTC-8), enable the
<code>store.parquet.reader.strings_signed_min_max</code> option, which allows
Drill to use binary statistics in older Parquet files. </p>
+
+<p><strong>Note:</strong> DECIMAL filter pushdown only works for Parquet files
created by Parquet 1.10.0 or later due to issue <a
href="https://issues.apache.org/jira/browse/PARQUET-1322">PARQUET-1322</a>.
</p>
+
+<h3 id="parquet-files-created-by-hive">Parquet Files Created by Hive</h3>
+
+<p>In Hive 2.3, Parquet files are created by a pre-1.10.0 version of Parquet.
If the data in the binary columns is in ASCII format, you can enable the
<code>store.parquet.reader.strings_signed_min_max</code> option to enable
pushdown support for VARCHAR data types. DECIMAL filter pushdown is not
supported. </p>
+
+<h3 id="drill-generated-metadata-files">Drill Generated Metadata Files</h3>
+
+<p>Parquet filter pushdown for DECIMAL and VARCHAR data types may not work
correctly on Drill metadata files that were generated prior to Drill 1.15.
Regenerate all Drill metadata files using Drill 1.15 or later to ensure that
Parquet filter pushdown on VARCHAR and DECIMAL data types works correctly on
Drill generated metadata files.</p>
+
+<p>If the <code>store.parquet.reader.strings_signed_min_max</code> option is
not enabled during regeneration, the minimum and maximum values for the binary
data will not be written. When the binary data is in ASCII format, enabling the
<code>store.parquet.reader.strings_signed_min_max</code> option during
regeneration ensures that the minimum and maximum values are written and thus
read back and used during filter pushdown. </p>
+
+<h3 id="enabling-statistics-use-for-pre-1-10-0-parquet-files">Enabling
Statistics Use for Pre-1.10.0 Parquet Files</h3>
+
+<p>If Parquet files were created with a pre-1.10.0 version of Parquet, and the
data in binary columns is in ASCII format (not UTF-8), you can enable Drill to
use the statistics for Parquet filter pushdown on VARCHAR and DECIMAL data
types.</p>
+
+<p>You can use either of the following methods to enable this functionality in
Drill: </p>
+
+<ul>
+<li><p>In the <code>parquet</code> format plugin configuration, add the
<code>enableStringsSignedMinMax</code> option, and set the option to
<code>true</code>, as shown: </p>
+<div class="highlight"><pre><code class="language-text"
data-lang="text">"parquet" : {
+ type: "parquet",
+ enableStringsSignedMinMax: true
+ }
+</code></pre></div>
+<p>This configuration applies to all Parquet files in the <code>parquet</code>
format plugin to which this storage plugin points, including the configured
workspaces.</p></li>
+<li><p>From the command line, enable the
<code>store.parquet.reader.strings_signed_min_max</code> option at the session
or system level, as shown: </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SET
`store.parquet.reader.strings_signed_min_max`='true';
+ALTER SYSTEM SET `store.parquet.reader.strings_signed_min_max`='true';
+</code></pre></div>
+<p><strong>Note:</strong> </p>
+
+<ul>
+<li>The <code>store.parquet.reader.strings_signed_min_max</code> option allows
three values: <code>'true'</code>, <code>'false'</code>,
<code>''</code>(empty string). By default, the value is an empty
string.<br></li>
+<li>Setting this option at the system level applies to all Parquet files in
the system. Alternatively, you can set this option in the Drill Web UI. Options
in the Drill Web UI are set at the system level.<br></li>
+<li>When set at the session level, the setting takes precedence over the
setting in the parquet format plugin and overrides the system level
setting.<br></li>
+</ul></li>
+</ul>
+
<h2 id="using-parquet-filter-pushdown">Using Parquet Filter Pushdown</h2>
<p>Currently, Parquet filter pushdown only supports filters that reference
columns from a single table (local filters). Parquet filter pushdown requires
the minimum and maximum values in the Parquet file metadata. All Parquet files
created in Drill using the CTAS statement contain the necessary metadata. If
your Parquet files were created using another tool, you may need to use Drill
to read and rewrite the files using the <a
href="/docs/create-table-as-ctas/">CTAS command</a>.</p>
-<p>Parquet filter pushdown works best if you presort the data. You do not have
to sort the entire data set at once. You can sort a subset of the data set,
sort another subset, and so on. </p>
+<p>Parquet filter pushdown works best if you presort the data. You do not have
to sort the entire data set at once. You can sort a subset of the data set,
sort another subset, and so on. </p>
<h3 id="configuring-parquet-filter-pushdown">Configuring Parquet Filter
Pushdown</h3>
-<p>Use the <a href="/docs/alter-system/">ALTER SYSTEM|SESSION SET</a> command
with the Parquet filter pushdown options to enable or disable the feature, and
set the number of row groups for a table. </p>
+<p>Use the <a href="/docs/alter-system/">ALTER SYSTEM</a> or <a
href="/docs/set/">SET</a> command with the Parquet filter pushdown options to
enable or disable the related features. </p>
<p>The following table lists the Parquet filter pushdown options with their
descriptions and default values: </p>
@@ -1313,21 +1363,22 @@
</tr>
</thead><tbody>
<tr>
-<td>"planner.store.parquet.rowgroup.filter.pushdown"</td>
-<td>Turns the Parquet filter pushdown feature on or off.</td>
+<td>planner.store.parquet.rowgroup.filter.pushdown</td>
+<td>Turns the Parquet filter pushdown feature on or off.</td>
<td>TRUE</td>
</tr>
<tr>
-<td>"planner.store.parquet.rowgroup.filter.pushdown.threshold"</td>
-<td>Sets the number of row groups that a table can have. You can increase
the threshold if the filter can prune many row groups. However, if this
setting is too high, the filter evaluation overhead increases. Base this
setting on the data set. Reduce this setting if the planning time is
significant, or you do not see any benefit at runtime.</td>
+<td>planner.store.parquet.rowgroup.filter.pushdown.threshold</td>
+<td>Sets the number of row groups that a table can have. You can increase
the threshold if the filter can prune many row groups. However, if this
setting is too high, the filter evaluation overhead increases. Base this
setting on the data set. Reduce this setting if the planning time is
significant, or you do not see any benefit at runtime.</td>
<td>10,000</td>
</tr>
+<tr>
+<td>store.parquet.reader.strings_signed_min_max</td>
+<td>Allows binary statistics usage for Parquet files created with a
pre-1.10.0 version of Parquet. Files created pre-1.10.0 have incorrectly
calculated statistics for UTF-8 data. If you know that data in the binary
columns is in ASCII (not UTF-8), setting this option to 'true'
enables statistics usage for VARCHAR and DECIMAL data types. Default is
unset; empty string. Allowed values are 'true', 'false',
'' (empty string).</td>
+<td>''(empty string)</td>
+</tr>
</tbody></table>
-<h3 id="viewing-the-query-plan">Viewing the Query Plan</h3>
-
-<p>Because Drill applies Parquet filter pushdown during the query planning
phase, you can view the query execution plan to see if Drill pushes down the
filter when a query on a Parquet file contains a filter expression. You can run
the <a href="/docs/explain/">EXPLAIN PLAN command</a> to see the execution plan
for the query, as shown in the following example.</p>
-
<p><strong>Example</strong> </p>
<p>Starting in Drill 1.14, Drill supports the planner rule,
JoinPushTransitivePredicatesRule, which enables Drill to infer filter
conditions for join queries and push the filter conditions down to the data
source. </p>
@@ -1349,7 +1400,7 @@
<p>The following table lists the supported and unsupported clauses, operators,
data types, function, and scenarios for Parquet filter pushdown: </p>
-<p><strong>Note:</strong> <sup>1</sup> indicates support as of Drill 1.13.
<sup>2</sup> indicates support as of Drill 1.14. </p>
+<p><strong>Note:</strong> <sup>1</sup> indicates support as of Drill 1.13.
<sup>2</sup> indicates support as of Drill 1.14. <sup>3</sup> indicates support
as of Drill 1.15. </p>
<table><thead>
<tr>
@@ -1375,8 +1426,8 @@
</tr>
<tr>
<td>Data Types</td>
-<td>INT, BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, TIME, <sup>1</sup>BOOLEAN
(true, false)</td>
-<td>CHAR, VARCHAR columns, Hive TIMESTAMP</td>
+<td>INT, BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, TIME, <sup>1</sup>BOOLEAN
(true, false), <sup>3</sup>VARCHAR and DECIMAL columns</td>
+<td>CHAR, Hive TIMESTAMP</td>
</tr>
<tr>
<td>Function</td>
diff --git a/feed.xml b/feed.xml
index 95b9f29..24914ae 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
</description>
<link>/</link>
<atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
- <pubDate>Tue, 11 Dec 2018 13:49:25 -0800</pubDate>
- <lastBuildDate>Tue, 11 Dec 2018 13:49:25 -0800</lastBuildDate>
+ <pubDate>Fri, 14 Dec 2018 13:58:54 -0800</pubDate>
+ <lastBuildDate>Fri, 14 Dec 2018 13:58:54 -0800</lastBuildDate>
<generator>Jekyll v2.5.2</generator>
<item>
diff --git a/team/index.html b/team/index.html
index c7456fd..59846ab 100644
--- a/team/index.html
+++ b/team/index.html
@@ -128,158 +128,210 @@
<table><thead>
<tr>
-<th>Name</th>
-<th>Alias (email is <alias>@apache.org)</th>
+<th><strong>Name</strong></th>
+<th><strong>Alias (email is <alias>@apache.org)</strong></th>
</tr>
</thead><tbody>
<tr>
-<td>Jacques Nadeau</td>
-<td>jacques</td>
+<td>Abdel Hakim Deneche</td>
+<td>adeneche</td>
</tr>
<tr>
-<td>Tomer Shiran</td>
-<td>tshiran</td>
+<td>Aditya Kishore</td>
+<td>adi</td>
</tr>
<tr>
-<td>Ted Dunning</td>
-<td>tdunning</td>
+<td>Abhishek Girish</td>
+<td>agirish</td>
</tr>
<tr>
-<td>Jason Frantz</td>
-<td>jason</td>
+<td>AnilKumar B</td>
+<td>akumarb2010</td>
</tr>
<tr>
-<td>MC Srivas</td>
-<td>srivas</td>
+<td>Aman Sinha</td>
+<td>amansinha</td>
</tr>
<tr>
-<td>Julian Hyde</td>
-<td>jhyde</td>
+<td>Arina Ielchiieva</td>
+<td>arina</td>
</tr>
<tr>
-<td>Tim Chen</td>
-<td>tnachen</td>
+<td>Boaz Ben-Zvi</td>
+<td>boaz</td>
</tr>
<tr>
-<td>Mehant Baid</td>
-<td>mehant</td>
+<td>Bridget Bevens</td>
+<td>bridgetb</td>
</tr>
<tr>
-<td>Jinfeng Ni</td>
-<td>jni</td>
+<td>Kamesh Bhallamudi</td>
+<td>bvskamesh</td>
</tr>
<tr>
-<td>Venki Korukanti</td>
-<td>venki</td>
+<td>Charles Givre</td>
+<td>cgivre</td>
</tr>
<tr>
-<td>Jason Altekruse</td>
-<td>json</td>
+<td>Chunhui Shi</td>
+<td>cshi</td>
</tr>
<tr>
-<td>Aditya Kishore</td>
-<td>adi</td>
+<td>Chris Wensel</td>
+<td>cwensel</td>
</tr>
<tr>
-<td>Parth Chandra</td>
-<td>parthc</td>
+<td>Chris Westin</td>
+<td>cwestin</td>
</tr>
<tr>
-<td>Aman Sinha</td>
-<td>amansinha</td>
+<td>Ellen Friedman</td>
+<td>ellenf</td>
</tr>
<tr>
-<td>Steven Phillips</td>
-<td>smp</td>
+<td>German Shegalov</td>
+<td>gera</td>
</tr>
<tr>
-<td>Bridget Bevens</td>
-<td>bridgetb</td>
+<td>Gautam Parai</td>
+<td>gparai</td>
+</tr>
+<tr>
+<td>Grant Ingersoll</td>
+<td>gsingers</td>
</tr>
<tr>
<td>Hanifi Gunes</td>
<td>hg</td>
</tr>
<tr>
-<td>Abdelhakim Deneche</td>
-<td>adeneche</td>
+<td>Hanumath Rao Maduri</td>
+<td>hmaduri</td>
</tr>
<tr>
-<td>Sudheesh Katkam</td>
-<td>sudheesh</td>
+<td>Hsuan-Yi Chu</td>
+<td>hsuanyichu</td>
</tr>
<tr>
-<td>Ellen Friedman</td>
-<td>ellenf</td>
+<td>Isabel Drost-Fromm</td>
+<td>isabel</td>
</tr>
<tr>
-<td>Kris Hahn</td>
-<td>krishahn</td>
+<td>Jacques Nadeau</td>
+<td>jacques</td>
</tr>
<tr>
-<td>Neeraja Rentachintala</td>
-<td>neerajar</td>
+<td>Jason Frantz</td>
+<td>jason</td>
</tr>
<tr>
-<td>Chris Westin</td>
-<td>cwestin</td>
+<td>Julian Hyde</td>
+<td>jhyde</td>
</tr>
<tr>
-<td>Abhishek Girish</td>
-<td>agirish</td>
+<td>Jinfeng Ni</td>
+<td>jni</td>
</tr>
<tr>
-<td>Rahul Challapalli</td>
-<td>rkins</td>
+<td>Jason Altekruse</td>
+<td>json</td>
</tr>
<tr>
-<td>Arina Ielchiieva</td>
-<td>arina</td>
+<td>Karthikeyan Manivannan</td>
+<td>karthikm</td>
</tr>
<tr>
-<td>Paul Rogers</td>
-<td>progers</td>
+<td>Keys Botzum</td>
+<td>kbotzum</td>
+</tr>
+<tr>
+<td>Kris Hahn</td>
+<td>krishahn</td>
+</tr>
+<tr>
+<td>Kunal Khatua</td>
+<td>kunal</td>
</tr>
<tr>
<td>Laurent Goujon</td>
<td>laurent</td>
</tr>
<tr>
-<td>Charles Givre</td>
-<td>cgivre</td>
+<td>Mehant Baid</td>
+<td>mehant</td>
</tr>
<tr>
-<td>Boaz Ben-Zvi</td>
-<td>boaz</td>
+<td>Neeraja Rentachintala</td>
+<td>neerajar</td>
</tr>
<tr>
-<td>Anil Kumar Batchu</td>
-<td>akumarb2010</td>
+<td>Parth Chandra</td>
+<td>parthc</td>
</tr>
<tr>
-<td>Vitalii Diravka</td>
-<td>vitalii</td>
+<td>Padma Penumarthy</td>
+<td>ppadma</td>
</tr>
<tr>
-<td>Kamesh Bhallamudi</td>
-<td>kameshb</td>
+<td>Paul Rogers</td>
+<td>progers</td>
</tr>
<tr>
-<td>Kunal Khatua</td>
-<td>kunal</td>
+<td>Ryan Rawson</td>
+<td>rawson</td>
</tr>
<tr>
-<td>Volodymyr Vysotskyi</td>
-<td>volodymyr</td>
+<td>Rahul Kumar Challapalli</td>
+<td>rkins</td>
+</tr>
+<tr>
+<td>Steven Phillips</td>
+<td>smp</td>
</tr>
<tr>
<td>Sorabh Hamirwasia</td>
<td>sorabh</td>
</tr>
<tr>
+<td>Srivas</td>
+<td>srivas</td>
+</tr>
+<tr>
+<td>Sudheesh Katkam</td>
+<td>sudheesh</td>
+</tr>
+<tr>
+<td>Ted Dunning</td>
+<td>tdunning</td>
+</tr>
+<tr>
<td>Timothy Farkas</td>
<td>timothyfarkas</td>
</tr>
+<tr>
+<td>Timothy Chen</td>
+<td>tnachen</td>
+</tr>
+<tr>
+<td>Tomer Shiran</td>
+<td>tshiran</td>
+</tr>
+<tr>
+<td>Venki Korukanti</td>
+<td>venki</td>
+</tr>
+<tr>
+<td>Vitalii Diravka</td>
+<td>vitalii</td>
+</tr>
+<tr>
+<td>Vova Vysotskyi</td>
+<td>volodymyr</td>
+</tr>
+<tr>
+<td>Weijie Tong</td>
+<td>weijie</td>
+</tr>
</tbody></table>
</div>