This is an automated email from the ASF dual-hosted git repository. yumwang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 59ba09a3c51 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0 59ba09a3c51 is described below commit 59ba09a3c511b3f11a07138afae6dc9f15edf99d Author: Yuming Wang <yumw...@ebay.com> AuthorDate: Sat Apr 15 09:15:34 2023 +0800 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0 ### What changes were proposed in this pull request? This PR upgrades Apache Parquet to 1.13.0. Apache Parquet [1.13.0 release notes](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.0/CHANGES.md?plain=1#L22-L78). ### Why are the changes needed? 1. This release includes [PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160). So we no longer need [SPARK-41952](https://issues.apache.org/jira/browse/SPARK-41952). 2. This release includes [Java Vector API support](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.0/README.md?plain=1#L88-L100). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test and benchmark test. TPC-DS benchmark result: Query | Parquet 1.13.0(first time) | Parquet 1.12.3(first time) | Parquet 1.13.0(second time) | Parquet 1.12.3(second time) | Parquet 1.13.0(third time) | Parquet 1.12.3(third time) -- | -- | -- | -- | -- | -- | -- q1.sql | 37.819 | 37.786 | 36.322 | 37.59 | 37.772 | 36.776 q2.sql | 42.132 | 41.513 | 43.189 | 42.274 | 42.859 | 42.605 q3.sql | 5.933 | 6.1 | 6.082 | 6.071 | 6.128 | 6.094 q4.sql | 335.051 | 319.173 | 322.396 | 320.977 | 324.464 | 326.822 q5.sql | 78.41 | 76.631 | 76.841 | 76.37 | 78.257 | 76.502 q6.sql | 9.006 | 9.11 | 8.737 | 8.577 | 8.729 | 9.05 q7.sql | 12.881 | 12.731 | 12.685 | 12.662 | 12.606 | 12.675 q8.sql | 10.122 | 10.092 | 10.035 | 10.853 | 10.277 | 10.841 q9.sql | 72.562 | 71.942 | 73.649 | 73.04 | 72.899 | 72.01 q10.sql | 14.127 | 13.075 | 14.276 | 13.913 | 13.281 | 13.229 q11.sql | 111.334 | 111.612 | 110.952 | 110.776 | 111.686 | 112.27 q12.sql | 3.138 | 3.854 | 3.187 | 3.613 | 3.437 | 3.306 q13.sql | 13.131 | 12.676 | 12.516 | 12.417 | 12.739 | 12.987 q14a.sql | 217.664 | 213.632 | 214.655 | 213.333 | 217.601 | 213.341 q14b.sql | 191.553 | 182.775 | 184.35 | 187.004 | 188.313 | 189.876 q15.sql | 10.308 | 10.46 | 10.304 | 9.901 | 10.175 | 10.307 q16.sql | 81.97 | 82.059 | 82.41 | 81.263 | 83.179 | 82.042 q17.sql | 28.876 | 28.905 | 30.41 | 29.573 | 29.555 | 28.837 q18.sql | 14.183 | 13.929 | 14.11 | 14.466 | 13.969 | 14.022 q19.sql | 6.611 | 7.593 | 6.652 | 6.659 | 6.446 | 6.533 q20.sql | 3.263 | 3.701 | 3.56 | 3.503 | 3.53 | 3.627 q21.sql | 2.252 | 2.188 | 2.249 | 2.128 | 2.161 | 2.252 q22.sql | 14.809 | 14.715 | 14.324 | 14.266 | 14.567 | 14.123 q23a.sql | 554.385 | 544.75 | 546.213 | 542.194 | 553.784 | 547.388 q23b.sql | 781.236 | 768.367 | 770.584 | 776.065 | 776.502 | 776.006 q24a.sql | 196.806 | 193.989 | 197.608 | 194.416 | 194.71 | 192.817 q24b.sql | 176.56 | 183.084 | 177.486 | 177.936 | 177.776 | 177.389 q25.sql | 22.323 | 22.089 | 22.665 | 22.049 | 22.248 | 22.317 q26.sql | 8.574 | 8.356 | 8.174 | 8.753 | 8.186 | 8.302 q27.sql | 9.056 | 8.252 | 8.37 | 8.319 | 8.516 | 8.38 q28.sql | 102.185 | 102.382 | 102.344 | 103.058 | 102.024 | 102.786 q29.sql | 75.655 | 75.604 | 75.217 | 75.532 | 75.835 | 76.024 q30.sql | 12.476 | 12.966 | 13.039 | 14.108 | 12.19 | 13.143 q31.sql | 26.343 | 27.632 | 26.337 | 26.791 | 26.74 | 26.098 q32.sql | 3.251 | 3.41 | 3.378 | 3.333 | 3.371 | 3.516 q33.sql | 7.143 | 6.125 | 6.85 | 6.718 | 7.067 | 6.615 q34.sql | 8.53 | 8.656 | 8.536 | 8.866 | 8.358 | 8.589 q35.sql | 35.212 | 35.571 | 35.659 | 37.631 | 36.292 | 35.603 q36.sql | 9.264 | 9.166 | 9.748 | 9.488 | 9.45 | 9.469 q37.sql | 36.368 | 35.881 | 37.023 | 36.578 | 35.823 | 36.7 q38.sql | 74.58 | 73.472 | 72.926 | 73.823 | 71.097 | 73.329 q39a.sql | 8.596 | 7.637 | 8.036 | 7.984 | 7.849 | 7.88 q39b.sql | 7.233 | 6.641 | 6.278 | 7.06 | 6.595 | 6.691 q40.sql | 17.34 | 16.558 | 16.448 | 16.864 | 16.432 | 16.413 q41.sql | 1.223 | 1.105 | 1.103 | 1.182 | 1.232 | 1.304 q42.sql | 2.464 | 2.441 | 2.554 | 2.544 | 2.314 | 2.393 q43.sql | 7.477 | 7.396 | 7.394 | 7.764 | 7.381 | 7.534 q44.sql | 30.228 | 30.516 | 30.859 | 31.057 | 30.372 | 29.008 q45.sql | 9.93 | 10.089 | 9.874 | 10.075 | 9.802 | 9.838 q46.sql | 9.544 | 9.949 | 9.503 | 9.755 | 9.395 | 9.25 q47.sql | 27.322 | 26.952 | 26.974 | 26.83 | 27.087 | 26.991 q48.sql | 14.266 | 14.39 | 14.517 | 14.684 | 14.471 | 14.61 q49.sql | 21.279 | 21.733 | 20.286 | 20.945 | 22.388 | 21.52 q50.sql | 191.416 | 194.256 | 196.701 | 194.113 | 193.354 | 191.004 q51.sql | 37.552 | 37.767 | 38.317 | 37.731 | 37.369 | 38.187 q52.sql | 2.206 | 2.406 | 2.235 | 2.362 | 2.337 | 2.278 q53.sql | 5.282 | 5.131 | 5.465 | 5.137 | 5.142 | 5.069 q54.sql | 13.039 | 12.655 | 13.047 | 12.382 | 12.992 | 12.988 q55.sql | 2.534 | 2.39 | 2.375 | 2.867 | 2.623 | 2.546 q56.sql | 7.365 | 7.087 | 6.902 | 7.406 | 7.586 | 7.081 q57.sql | 18.064 | 17.945 | 18.699 | 17.664 | 18.362 | 18.222 q58.sql | 6.198 | 6.702 | 6.109 | 6.211 | 5.9 | 6.101 q59.sql | 28.266 | 28.195 | 27.876 | 28.748 | 29.027 | 28.543 q60.sql | 6.847 | 7.143 | 7.322 | 7.1 | 7.207 | 7.215 q61.sql | 7.258 | 7.62 | 7.317 | 7.781 | 7.616 | 7.669 q62.sql | 10.334 | 11.523 | 10.389 | 10.378 | 10.072 | 10.583 q63.sql | 4.631 | 4.944 | 4.947 | 5.124 | 4.61 | 4.865 q64.sql | 249.694 | 252.117 | 254.359 | 254.813 | 253.236 | 250.401 q65.sql | 78.742 | 79.184 | 78.559 | 78.305 | 78.985 | 78.515 q66.sql | 14.98 | 14.854 | 14.794 | 14.767 | 14.781 | 14.696 q67.sql | 1019.744 | 1048.439 | 987.894 | 972.062 | 927.566 | 1002.206 q68.sql | 8.903 | 8.915 | 8.277 | 8.709 | 9.349 | 9.178 q69.sql | 13.097 | 13.01 | 14.352 | 12.036 | 12.302 | 12.843 q70.sql | 21.175 | 21.085 | 21.102 | 20.471 | 20.129 | 19.678 q71.sql | 15.13 | 15.526 | 14.929 | 15.231 | 15.406 | 15.487 q72.sql | 76.463 | 75.851 | 72.002 | 72.356 | 72.676 | 74.798 q73.sql | 5.894 | 6.09 | 5.877 | 6.051 | 6.365 | 6.634 q74.sql | 99.106 | 99.356 | 100.291 | 99.51 | 96.766 | 97.292 q75.sql | 126.625 | 128.094 | 127.364 | 128.575 | 127.418 | 125.806 q76.sql | 35.172 | 33.601 | 34.752 | 34.764 | 34.228 | 35.748 q77.sql | 8.394 | 8.01 | 7.951 | 8.061 | 7.839 | 8.348 q78.sql | 289.061 | 287.508 | 283.615 | 288.768 | 288.448 | 288.661 q79.sql | 10.048 | 9.251 | 9.396 | 9.81 | 8.607 | 8.341 q80.sql | 59.68 | 59.458 | 60.234 | 60.415 | 61.325 | 60.744 q81.sql | 17.822 | 18.815 | 18.488 | 18.95 | 17.911 | 18.113 q82.sql | 64.781 | 63.957 | 63.621 | 64.38 | 63.637 | 64.488 q83.sql | 4.686 | 4.922 | 4.635 | 4.827 | 4.678 | 5.071 q84.sql | 10.987 | 10.629 | 10.841 | 11.151 | 10.646 | 10.6 q85.sql | 12.689 | 13.304 | 13.362 | 13.19 | 13.779 | 12.657 q86.sql | 6.48 | 6.491 | 6.722 | 6.667 | 6.833 | 6.52 q87.sql | 77.589 | 77.377 | 77.177 | 77.011 | 78.339 | 78.399 q88.sql | 83.876 | 83.676 | 84.044 | 83.761 | 84.201 | 84.089 q89.sql | 6.741 | 6.564 | 6.755 | 6.708 | 6.704 | 6.794 q90.sql | 7.79 | 7.812 | 7.882 | 7.88 | 7.875 | 7.854 q91.sql | 4.072 | 3.728 | 3.883 | 3.976 | 4.151 | 4.035 q92.sql | 3.05 | 3.155 | 3.336 | 3.067 | 2.942 | 3.099 q93.sql | 356.412 | 360.731 | 358.14 | 356 | 356.108 | 358.011 q94.sql | 43.202 | 43.561 | 44.63 | 44.486 | 43.993 | 42.693 q95.sql | 197.185 | 199.657 | 193.975 | 195.843 | 201.801 | 196.113 q96.sql | 12.765 | 12.481 | 12.682 | 12.799 | 12.528 | 12.505 q97.sql | 82.895 | 82.067 | 81.754 | 82.799 | 81.788 | 81.572 q98.sql | 7.338 | 7.066 | 7.133 | 7.005 | 7.254 | 7.047 q99.sql | 18.431 | 17.874 | 17.826 | 17.861 | 17.705 | 17.878 total | 7105.675 | 7091.391 | 7030.209 | 7021.7 | 6992.413 | 7047.295 Closes #40555 from wangyum/SPARK-42926. Authored-by: Yuming Wang <yumw...@ebay.com> Signed-off-by: Yuming Wang <yumw...@ebay.com> --- dev/deps/spark-deps-hadoop-2-hive-2.3 | 12 ++++++------ dev/deps/spark-deps-hadoop-3-hive-2.3 | 12 ++++++------ docs/sql-data-sources-parquet.md | 4 ++-- pom.xml | 4 ++-- 4 files changed, 16 insertions(+), 16 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index fc320529fda..5fa2ddfd367 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -229,12 +229,12 @@ orc-shims/1.8.3//orc-shims-1.8.3.jar oro/2.0.8//oro-2.0.8.jar osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar paranamer/2.8//paranamer-2.8.jar -parquet-column/1.12.3//parquet-column-1.12.3.jar -parquet-common/1.12.3//parquet-common-1.12.3.jar -parquet-encoding/1.12.3//parquet-encoding-1.12.3.jar -parquet-format-structures/1.12.3//parquet-format-structures-1.12.3.jar -parquet-hadoop/1.12.3//parquet-hadoop-1.12.3.jar -parquet-jackson/1.12.3//parquet-jackson-1.12.3.jar +parquet-column/1.13.0//parquet-column-1.13.0.jar +parquet-common/1.13.0//parquet-common-1.13.0.jar +parquet-encoding/1.13.0//parquet-encoding-1.13.0.jar +parquet-format-structures/1.13.0//parquet-format-structures-1.13.0.jar +parquet-hadoop/1.13.0//parquet-hadoop-1.13.0.jar +parquet-jackson/1.13.0//parquet-jackson-1.13.0.jar pickle/1.3//pickle-1.3.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar py4j/0.10.9.7//py4j-0.10.9.7.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 25a54fdd2a9..f30984f60ea 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -215,12 +215,12 @@ orc-shims/1.8.3//orc-shims-1.8.3.jar oro/2.0.8//oro-2.0.8.jar osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar paranamer/2.8//paranamer-2.8.jar -parquet-column/1.12.3//parquet-column-1.12.3.jar -parquet-common/1.12.3//parquet-common-1.12.3.jar -parquet-encoding/1.12.3//parquet-encoding-1.12.3.jar -parquet-format-structures/1.12.3//parquet-format-structures-1.12.3.jar -parquet-hadoop/1.12.3//parquet-hadoop-1.12.3.jar -parquet-jackson/1.12.3//parquet-jackson-1.12.3.jar +parquet-column/1.13.0//parquet-column-1.13.0.jar +parquet-common/1.13.0//parquet-common-1.13.0.jar +parquet-encoding/1.13.0//parquet-encoding-1.13.0.jar +parquet-format-structures/1.13.0//parquet-format-structures-1.13.0.jar +parquet-hadoop/1.13.0//parquet-hadoop-1.13.0.jar +parquet-jackson/1.13.0//parquet-jackson-1.13.0.jar pickle/1.3//pickle-1.3.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar py4j/0.10.9.7//py4j-0.10.9.7.jar diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md index 4a4a3938c86..58d90fb491b 100644 --- a/docs/sql-data-sources-parquet.md +++ b/docs/sql-data-sources-parquet.md @@ -257,7 +257,7 @@ REFRESH TABLE my_table; Since Spark 3.2, columnar encryption is supported for Parquet tables with Apache Parquet 1.12+. -Parquet uses the envelope encryption practice, where file parts are encrypted with "data encryption keys" (DEKs), and the DEKs are encrypted with "master encryption keys" (MEKs). The DEKs are randomly generated by Parquet for each encrypted file/column. The MEKs are generated, stored and managed in a Key Management Service (KMS) of user’s choice. The Parquet Maven [repository](https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.12.3/) has a jar with a mock KMS implementati [...] +Parquet uses the envelope encryption practice, where file parts are encrypted with "data encryption keys" (DEKs), and the DEKs are encrypted with "master encryption keys" (MEKs). The DEKs are randomly generated by Parquet for each encrypted file/column. The MEKs are generated, stored and managed in a Key Management Service (KMS) of user’s choice. The Parquet Maven [repository](https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.13.0/) has a jar with a mock KMS implementati [...] <div class="codetabs"> @@ -350,7 +350,7 @@ Dataset<Row> df2 = spark.read().parquet("/path/to/table.parquet.encrypted"); #### KMS Client -The InMemoryKMS class is provided only for illustration and simple demonstration of Parquet encryption functionality. **It should not be used in a real deployment**. The master encryption keys must be kept and managed in a production-grade KMS system, deployed in user's organization. Rollout of Spark with Parquet encryption requires implementation of a client class for the KMS server. Parquet provides a plug-in [interface](https://github.com/apache/parquet-mr/blob/1.12.3/parquet-hadoop/s [...] +The InMemoryKMS class is provided only for illustration and simple demonstration of Parquet encryption functionality. **It should not be used in a real deployment**. The master encryption keys must be kept and managed in a production-grade KMS system, deployed in user's organization. Rollout of Spark with Parquet encryption requires implementation of a client class for the KMS server. Parquet provides a plug-in [interface](https://github.com/apache/parquet-mr/blob/1.13.0/parquet-hadoop/s [...] <div data-lang="java" markdown="1"> {% highlight java %} diff --git a/pom.xml b/pom.xml index 198a82b8c27..9811742b866 100644 --- a/pom.xml +++ b/pom.xml @@ -140,7 +140,7 @@ <kafka.version>3.4.0</kafka.version> <!-- After 10.15.1.3, the minimum required version is JDK9 --> <derby.version>10.14.2.0</derby.version> - <parquet.version>1.12.3</parquet.version> + <parquet.version>1.13.0</parquet.version> <orc.version>1.8.3</orc.version> <orc.classifier>shaded-protobuf</orc.classifier> <jetty.version>9.4.51.v20230217</jetty.version> @@ -2361,7 +2361,7 @@ <groupId>${hive.group}</groupId> <artifactId>hive-service-rpc</artifactId> </exclusion> - <!-- parquet-hadoop-bundle:1.8.1 conflict with 1.12.3 --> + <!-- parquet-hadoop-bundle:1.8.1 conflict with 1.13.0 --> <exclusion> <groupId>org.apache.parquet</groupId> <artifactId>parquet-hadoop-bundle</artifactId> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org