This is an automated email from the ASF dual-hosted git repository.
viirya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 233dd98cd6b [SPARK-38840][SQL] Enable
`spark.sql.parquet.enableNestedColumnVectorizedReader` on master by default
233dd98cd6b is described below
commit 233dd98cd6b440870d9d96343d3f369b6e2f8dfa
Author: Chao Sun <[email protected]>
AuthorDate: Fri Apr 8 21:29:04 2022 -0700
[SPARK-38840][SQL] Enable
`spark.sql.parquet.enableNestedColumnVectorizedReader` on master by default
### What changes were proposed in this pull request?
Enable `spark.sql.parquet.enableNestedColumnVectorizedReader` on master
branch so it can be covered by more tests.
### Why are the changes needed?
By default `spark.sql.parquet.enableNestedColumnVectorizedReader` is turned
off at the moment, which means many Parquet related tests for complex types
will still go through the old row-based Parquet reader from `parquet-mr`
library. This is not ideal since we want to prefer the new vectorized Parquet
reader in the long term and make sure it is thoroughly tested.
### Does this PR introduce _any_ user-facing change?
Yes, `spark.sql.parquet.enableNestedColumnVectorizedReader` is turned on by
default now, but only in master branch.
### How was this patch tested?
Existing tests.
Closes #36124 from sunchao/enable.
Authored-by: Chao Sun <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
.../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out | 3 ++-
sql/core/src/test/resources/sql-tests/results/explain.sql.out | 3 ++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 28888a3fdf8..b2231f038e4 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1015,7 +1015,7 @@ object SQLConf {
s"Requires ${PARQUET_VECTORIZED_READER_ENABLED.key} to be enabled.")
.version("3.3.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
val PARQUET_RECORD_FILTER_ENABLED =
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
.doc("If true, enables Parquet's native record-level filtering using the
pushed down " +
diff --git a/sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out
b/sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out
index f5e5b46d29c..f98fb1eb2a5 100644
--- a/sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out
@@ -1125,7 +1125,8 @@ struct<plan:string>
-- !query output
== Physical Plan ==
*Filter v#x IN ([a],null)
-+- FileScan parquet default.t[v#x] Batched: false, DataFilters: [v#x IN
([a],null)], Format: Parquet, Location [not included in
comparison]/{warehouse_dir}/t], PartitionFilters: [], PushedFilters: [In(v,
[[a],null])], ReadSchema: struct<v:array<string>>
++- *ColumnarToRow
+ +- FileScan parquet default.t[v#x] Batched: true, DataFilters: [v#x IN
([a],null)], Format: Parquet, Location [not included in
comparison]/{warehouse_dir}/t], PartitionFilters: [], PushedFilters: [In(v,
[[a],null])], ReadSchema: struct<v:array<string>>
-- !query
diff --git a/sql/core/src/test/resources/sql-tests/results/explain.sql.out
b/sql/core/src/test/resources/sql-tests/results/explain.sql.out
index 4e552d51a39..a563eda1e7b 100644
--- a/sql/core/src/test/resources/sql-tests/results/explain.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/explain.sql.out
@@ -1067,7 +1067,8 @@ struct<plan:string>
-- !query output
== Physical Plan ==
*Filter v#x IN ([a],null)
-+- FileScan parquet default.t[v#x] Batched: false, DataFilters: [v#x IN
([a],null)], Format: Parquet, Location [not included in
comparison]/{warehouse_dir}/t], PartitionFilters: [], PushedFilters: [In(v,
[[a],null])], ReadSchema: struct<v:array<string>>
++- *ColumnarToRow
+ +- FileScan parquet default.t[v#x] Batched: true, DataFilters: [v#x IN
([a],null)], Format: Parquet, Location [not included in
comparison]/{warehouse_dir}/t], PartitionFilters: [], PushedFilters: [In(v,
[[a],null])], ReadSchema: struct<v:array<string>>
-- !query
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]