This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 0b214f166a92 [MINOR][DOCS][TESTS] Update repo name and link from
`parquet-mr` to `parquet-java`
0b214f166a92 is described below
commit 0b214f166a92c4e6b4fdc102f7718903a1a152d5
Author: Wei Guo <[email protected]>
AuthorDate: Fri Jun 14 10:33:49 2024 +0800
[MINOR][DOCS][TESTS] Update repo name and link from `parquet-mr` to
`parquet-java`
### What changes were proposed in this pull request?
This pr replaces parquet related repo name from `parquet-mr` to
`parquet-java` and repo link from `https://github.com/apache/parquet-mr` to
`https://github.com/apache/parquet-java`.
### Why are the changes needed?
The upstream repo name has made a change with
[INFRA-25802](https://issues.apache.org/jira/browse/INFRA-25802),
[PARQUET-2475](https://issues.apache.org/jira/browse/PARQUET-2475), it's better
to update with the latest name and link.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Passed GA.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #46963 from wayneguow/parquet.
Authored-by: Wei Guo <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
docs/sql-data-sources-load-save-functions.md | 2 +-
docs/sql-data-sources-parquet.md | 6 +++---
.../datasources/parquet/ParquetInteroperabilitySuite.scala | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/docs/sql-data-sources-load-save-functions.md
b/docs/sql-data-sources-load-save-functions.md
index b42f6e84076d..70105c22e583 100644
--- a/docs/sql-data-sources-load-save-functions.md
+++ b/docs/sql-data-sources-load-save-functions.md
@@ -109,7 +109,7 @@ For example, you can control bloom filters and dictionary
encodings for ORC data
The following ORC example will create bloom filter and use dictionary encoding
only for `favorite_color`.
For Parquet, there exists `parquet.bloom.filter.enabled` and
`parquet.enable.dictionary`, too.
To find more detailed information about the extra ORC/Parquet options,
-visit the official Apache [ORC](https://orc.apache.org/docs/spark-config.html)
/ [Parquet](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop)
websites.
+visit the official Apache [ORC](https://orc.apache.org/docs/spark-config.html)
/ [Parquet](https://github.com/apache/parquet-java/tree/master/parquet-hadoop)
websites.
ORC data source:
diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index f5c5ccd3b89a..5a0ca595fabb 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -350,7 +350,7 @@ Dataset<Row> df2 =
spark.read().parquet("/path/to/table.parquet.encrypted");
#### KMS Client
-The InMemoryKMS class is provided only for illustration and simple
demonstration of Parquet encryption functionality. **It should not be used in a
real deployment**. The master encryption keys must be kept and managed in a
production-grade KMS system, deployed in user's organization. Rollout of Spark
with Parquet encryption requires implementation of a client class for the KMS
server. Parquet provides a plug-in
[interface](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/p
[...]
+The InMemoryKMS class is provided only for illustration and simple
demonstration of Parquet encryption functionality. **It should not be used in a
real deployment**. The master encryption keys must be kept and managed in a
production-grade KMS system, deployed in user's organization. Rollout of Spark
with Parquet encryption requires implementation of a client class for the KMS
server. Parquet provides a plug-in
[interface](https://github.com/apache/parquet-java/blob/apache-parquet-1.13.1
[...]
<div data-lang="java" markdown="1">
{% highlight java %}
@@ -371,9 +371,9 @@ public interface KmsClient {
</div>
-An
[example](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/crypto/keytools/samples/VaultClient.java)
of such class for an open source
[KMS](https://www.vaultproject.io/api/secret/transit) can be found in the
parquet-mr repository. The production KMS client should be designed in
cooperation with organization's security administrators, and built by
developers with an experience in access control management. Once such class is
created, it c [...]
+An
[example](https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/crypto/keytools/samples/VaultClient.java)
of such class for an open source
[KMS](https://www.vaultproject.io/api/secret/transit) can be found in the
parquet-java repository. The production KMS client should be designed in
cooperation with organization's security administrators, and built by
developers with an experience in access control management. Once such class is
created, [...]
-Note: By default, Parquet implements a "double envelope encryption" mode, that
minimizes the interaction of Spark executors with a KMS server. In this mode,
the DEKs are encrypted with "key encryption keys" (KEKs, randomly generated by
Parquet). The KEKs are encrypted with MEKs in KMS; the result and the KEK
itself are cached in Spark executor memory. Users interested in regular
envelope encryption, can switch to it by setting the
`parquet.encryption.double.wrapping` parameter to `false` [...]
+Note: By default, Parquet implements a "double envelope encryption" mode, that
minimizes the interaction of Spark executors with a KMS server. In this mode,
the DEKs are encrypted with "key encryption keys" (KEKs, randomly generated by
Parquet). The KEKs are encrypted with MEKs in KMS; the result and the KEK
itself are cached in Spark executor memory. Users interested in regular
envelope encryption, can switch to it by setting the
`parquet.encryption.double.wrapping` parameter to `false` [...]
## Data Source Option
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
index fffc9e2b1924..baa11df302b0 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
@@ -213,8 +213,8 @@ class ParquetInteroperabilitySuite extends
ParquetCompatibilityTest with SharedS
// predicates because (a) in ParquetFilters, we ignore
TimestampType and (b) parquet
// does not read statistics from int96 fields, as they are
unsigned. See
// scalastyle:off line.size.limit
- //
https://github.com/apache/parquet-mr/blob/2fd62ee4d524c270764e9b91dca72e5cf1a005b7/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L419
- //
https://github.com/apache/parquet-mr/blob/2fd62ee4d524c270764e9b91dca72e5cf1a005b7/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L348
+ //
https://github.com/apache/parquet-java/blob/2fd62ee4d524c270764e9b91dca72e5cf1a005b7/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L419
+ //
https://github.com/apache/parquet-java/blob/2fd62ee4d524c270764e9b91dca72e5cf1a005b7/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L348
// scalastyle:on line.size.limit
//
// Just to be defensive in case anything ever changes in
parquet, this test checks
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]