This is an automated email from the ASF dual-hosted git repository.
bridgetb pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/drill.git
The following commit(s) were added to refs/heads/gh-pages by this push:
new e6a1c3d team update edits for DRILL-6744
e6a1c3d is described below
commit e6a1c3d454d92f6a1d575d3b7c182c7510403300
Author: Bridget Bevens <[email protected]>
AuthorDate: Fri Dec 14 13:32:27 2018 -0800
team update edits for DRILL-6744
---
.../026-parquet-filter-pushdown.md | 24 +++---
team.md | 91 ++++++++++++----------
2 files changed, 64 insertions(+), 51 deletions(-)
diff --git a/_docs/performance-tuning/026-parquet-filter-pushdown.md
b/_docs/performance-tuning/026-parquet-filter-pushdown.md
index a1c5416..fe50bf7 100644
--- a/_docs/performance-tuning/026-parquet-filter-pushdown.md
+++ b/_docs/performance-tuning/026-parquet-filter-pushdown.md
@@ -6,9 +6,11 @@ parent: "Performance Tuning"
Drill 1.9 introduces the Parquet filter pushdown option. Parquet filter
pushdown is a performance optimization that prunes extraneous data from a
Parquet file to reduce the amount of data that Drill scans and reads when a
query on a Parquet file contains a filter expression. Pruning data reduces the
I/O, CPU, and network overhead to optimize Drill’s performance.
-Parquet filter pushdown is enabled by default. When a query contains a filter
expression, you can run the [EXPLAIN PLAN
command]({{site.baseurl}}/docs/explain/) to see if Drill applies Parquet filter
pushdown to the query. You can enable and disable this feature using the [ALTER
SYSTEM|SESSION SET]({{site.baseurl}}/docs/alter-system/) command with the
`planner.store.parquet.rowgroup.filter.pushdown` option.
+Parquet filter pushdown is enabled by default. When a query contains a filter
expression, you can run the [EXPLAIN PLAN
command]({{site.baseurl}}/docs/explain/) to see if Drill applies Parquet filter
pushdown to the query. You can enable and disable this feature through the
`planner.store.parquet.rowgroup.filter.pushdown` option, as shown:
-As of Drill 1.13, the query planner in Drill can apply project push down,
filter push down, and partition pruning to star queries in common table
expressions (CTEs), views, and subqueries, for example:
+ SET `planner.store.parquet.rowgroup.filter.pushdown`='false'
+
+Starting in Drill 1.13, the query planner in Drill can apply project push
down, filter push down, and partition pruning to star queries in common table
expressions (CTEs), views, and subqueries, for example:
select col1 from (select * from t)
@@ -37,7 +39,7 @@ If Parquet files were created with a pre-1.10.0 version of
Parquet, and the data
In Hive 2.3, Parquet files are created by a pre-1.10.0 version of Parquet. If
the data in the binary columns is in ASCII format, you can enable the
`store.parquet.reader.strings_signed_min_max` option to enable pushdown support
for VARCHAR data types. DECIMAL filter pushdown is not supported.
###Drill Generated Metadata Files
-Parquet filter pushdown for DECIMAL and VARCHAR data types may not work
correctly on Drill metadata files that were generated prior to Drill 1.15.
Regenerate all Drill metadata files using Drill 1.15 or later to ensure that
Parquet filter pushdown works correctly on Drill generated metadata files.
+Parquet filter pushdown for DECIMAL and VARCHAR data types may not work
correctly on Drill metadata files that were generated prior to Drill 1.15.
Regenerate all Drill metadata files using Drill 1.15 or later to ensure that
Parquet filter pushdown on VARCHAR and DECIMAL data types works correctly on
Drill generated metadata files.
If the `store.parquet.reader.strings_signed_min_max` option is not enabled
during regeneration, the minimum and maximum values for the binary data will
not be written. When the binary data is in ASCII format, enabling the
`store.parquet.reader.strings_signed_min_max` option during regeneration
ensures that the minimum and maximum values are written and thus read back and
used during filter pushdown.
@@ -72,17 +74,15 @@ Currently, Parquet filter pushdown only supports filters
that reference columns
Parquet filter pushdown works best if you presort the data. You do not have to
sort the entire data set at once. You can sort a subset of the data set, sort
another subset, and so on.
###Configuring Parquet Filter Pushdown
-Use the [ALTER SYSTEM|SESSION SET]({{site.baseurl}}/docs/alter-system/)
command with the Parquet filter pushdown options to enable or disable the
feature, and set the number of row groups for a table.
+Use the [ALTER SYSTEM]({{site.baseurl}}/docs/alter-system/) or
[SET]({{site.baseurl}}/docs/set/) command with the Parquet filter pushdown
options to enable or disable the related features.
The following table lists the Parquet filter pushdown options with their
descriptions and default values:
-| Option | Description
| Default |
-|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
-| "planner.store.parquet.rowgroup.filter.pushdown" | Turns the
Parquet filter pushdown feature on or off.
| TRUE |
-| "planner.store.parquet.rowgroup.filter.pushdown.threshold" | Sets the number
of row groups that a table can have. You can increase the threshold if the
filter can prune many row groups. However, if this setting is too high, the
filter evaluation overhead increases. Base this setting on the data set.
Reduce this setting if the planning time is significant, or you do not see
any benefit at runtime. | 10,000 |
-
-###Viewing the Query Plan
-Because Drill applies Parquet filter pushdown during the query planning phase,
you can view the query execution plan to see if Drill pushes down the filter
when a query on a Parquet file contains a filter expression. You can run the
[EXPLAIN PLAN command]({{site.baseurl}}/docs/explain/) to see the execution
plan for the query, as shown in the following example.
+| Option | Description
[...]
+|----------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| planner.store.parquet.rowgroup.filter.pushdown | Turns the
Parquet filter pushdown feature on or off.
[...]
+| planner.store.parquet.rowgroup.filter.pushdown.threshold | Sets the number
of row groups that a table can have. You can increase the threshold if the
filter can prune many row groups. However, if this setting is too high, the
filter evaluation overhead increases. Base this setting on the data set.
Reduce this setting if the planning time is significant, or you do not see
any benefit at runtime.
[...]
+| store.parquet.reader.strings_signed_min_max | Allows binary
statistics usage for Parquet files created with a pre-1.10.0 version of
Parquet. Files created pre-1.10.0 have incorrectly calculated statistics for
UTF-8 data. If you know that data in the binary columns is in ASCII (not
UTF-8), setting this option to 'true' enables statistics usage for VARCHAR
and DECIMAL data types. Default is unset; empty string. Allowed values are
'true', 'false', '' (empty string [...]
**Example**
@@ -113,7 +113,7 @@ The following table lists the supported and unsupported
clauses, operators, data
| Clauses | WHERE, <sup>1</sup>WITH, HAVING (HAVING is
supported if Drill can pass the filter through GROUP BY.)
| -
|
| Operators | <sup>2</sup>BETWEEN, <sup>2</sup>ITEM, AND, OR,
NOT, <sup>1</sup>IS [NOT] NULL, <sup>1</sup>IS [NOT] TRUE, <sup>1</sup>IS [NOT]
FALSE, IN (An IN list is converted to OR if the number in the IN list is
within a certain threshold, for example 20. If greater than the threshold,
pruning cannot occur.) | - |
| Comparison Operators | <>, <, >, <=, >=, =
| -
|
-| Data Types | INT, BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, TIME,
<sup>1</sup>BOOLEAN (true, false), <sup>3</sup>VARCHAR columns
| CHAR, Hive TIMESTAMP |
+| Data Types | INT, BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, TIME,
<sup>1</sup>BOOLEAN (true, false), <sup>3</sup>VARCHAR and DECIMAL columns
| CHAR, Hive TIMESTAMP |
| Function | CAST is supported among the following types only:
int, bigint, float, double, <sup>1</sup>date, <sup>1</sup>timestamp, and
<sup>1</sup>time
| - |
| Other | <sup>2</sup>Enabled native Hive reader, Files with
multiple row groups, <sup>2</sup>Joins
| -
|
diff --git a/team.md b/team.md
index bd65f6d..4a8a3ec 100755
--- a/team.md
+++ b/team.md
@@ -8,42 +8,55 @@ We welcome contributions to the project. If you're interested
in contributing, t
## Drill Committers
-| Name | Alias (email is <alias>@apache.org) |
-|------|-------|
-| Jacques Nadeau | jacques |
-| Tomer Shiran | tshiran |
-| Ted Dunning | tdunning |
-| Jason Frantz | jason |
-| MC Srivas | srivas |
-| Julian Hyde | jhyde |
-| Tim Chen | tnachen |
-| Mehant Baid | mehant |
-| Jinfeng Ni | jni |
-| Venki Korukanti | venki |
-| Jason Altekruse | json |
-| Aditya Kishore | adi |
-| Parth Chandra | parthc |
-| Aman Sinha | amansinha |
-| Steven Phillips | smp |
-| Bridget Bevens | bridgetb |
-| Hanifi Gunes | hg |
-| Abdelhakim Deneche | adeneche |
-| Sudheesh Katkam | sudheesh |
-| Ellen Friedman | ellenf |
-| Kris Hahn | krishahn |
-| Neeraja Rentachintala | neerajar |
-| Chris Westin | cwestin |
-| Abhishek Girish | agirish |
-| Rahul Challapalli | rkins |
-| Arina Ielchiieva | arina |
-| Paul Rogers | progers |
-| Laurent Goujon | laurent |
-| Charles Givre | cgivre |
-| Boaz Ben-Zvi | boaz |
-| Anil Kumar Batchu | akumarb2010 |
-| Vitalii Diravka | vitalii |
-| Kamesh Bhallamudi | kameshb |
-| Kunal Khatua | kunal |
-| Volodymyr Vysotskyi | volodymyr |
-| Sorabh Hamirwasia | sorabh |
-| Timothy Farkas | timothyfarkas |
+| **Name** | **Alias (email is <alias>@apache.org)** |
+|-------------------------|-------------------------------------|
+| Abdel Hakim Deneche | adeneche |
+| Aditya Kishore | adi |
+| Abhishek Girish | agirish |
+| AnilKumar B | akumarb2010 |
+| Aman Sinha | amansinha |
+| Arina Ielchiieva | arina |
+| Boaz Ben-Zvi | boaz |
+| Bridget Bevens | bridgetb |
+| Kamesh Bhallamudi | bvskamesh |
+| Charles Givre | cgivre |
+| Chunhui Shi | cshi |
+| Chris Wensel | cwensel |
+| Chris Westin | cwestin |
+| Ellen Friedman | ellenf |
+| German Shegalov | gera |
+| Gautam Parai | gparai |
+| Grant Ingersoll | gsingers |
+| Hanifi Gunes | hg |
+| Hanumath Rao Maduri | hmaduri |
+| Hsuan-Yi Chu | hsuanyichu |
+| Isabel Drost-Fromm | isabel |
+| Jacques Nadeau | jacques |
+| Jason Frantz | jason |
+| Julian Hyde | jhyde |
+| Jinfeng Ni | jni |
+| Jason Altekruse | json |
+| Karthikeyan Manivannan | karthikm |
+| Keys Botzum | kbotzum |
+| Kris Hahn | krishahn |
+| Kunal Khatua | kunal |
+| Laurent Goujon | laurent |
+| Mehant Baid | mehant |
+| Neeraja Rentachintala | neerajar |
+| Parth Chandra | parthc |
+| Padma Penumarthy | ppadma |
+| Paul Rogers | progers |
+| Ryan Rawson | rawson |
+| Rahul Kumar Challapalli | rkins |
+| Steven Phillips | smp |
+| Sorabh Hamirwasia | sorabh |
+| Srivas | srivas |
+| Sudheesh Katkam | sudheesh |
+| Ted Dunning | tdunning |
+| Timothy Farkas | timothyfarkas |
+| Timothy Chen | tnachen |
+| Tomer Shiran | tshiran |
+| Venki Korukanti | venki |
+| Vitalii Diravka | vitalii |
+| Vova Vysotskyi | volodymyr |
+| Weijie Tong | weijie |
\ No newline at end of file