This is an automated email from the ASF dual-hosted git repository.
dzamo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git
The following commit(s) were added to refs/heads/master by this push:
new 3d0a35f Document new Parquet format version and codecs.
3d0a35f is described below
commit 3d0a35f602f02f618deab8592606c1ba3ef8debc
Author: James Turton <[email protected]>
AuthorDate: Wed Feb 23 13:40:04 2022 +0200
Document new Parquet format version and codecs.
---
.../en/data-sources-and-file-formats/040-parquet-format.md | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/_docs/en/data-sources-and-file-formats/040-parquet-format.md
b/_docs/en/data-sources-and-file-formats/040-parquet-format.md
index 5f1c8e5..3dc1b81 100644
--- a/_docs/en/data-sources-and-file-formats/040-parquet-format.md
+++ b/_docs/en/data-sources-and-file-formats/040-parquet-format.md
@@ -67,6 +67,20 @@ Use the ALTER command to set the `store.format` option.
``ALTER SYSTEM|SESSION SET `store.format` = 'parquet';``
+**Introduced in release:** 1.20.
+
+Optionally, set the Parquet format version. Parquet v2 introduced new data
encodings which may affect file size and read/write performance. Run
benchmarks with your own data to establish which works best in your environment
and, if you require interoperable Parquet files, be aware that at the time of
writing Parquet v1 has much wider support than does v2.
+
+``ALTER SYSTEM|SESSION SET `store.parquet.writer.format_version` = 'v2';``
+
+Also new in Drill 1.20 is an expanded set of compression codec choices as
listed in the config option description. These can also have a significant
impact on file size and read/write performance. If interoperability is a
concern, Snappy and gzip codecs have the widest support at the time of writing.
+
+``ALTER SYSTEM|SESSION SET `store.parquet.compression` = 'zstd';``
+
+{% include startnote.html %}
+Because of a mismatch between Drill's set of target platforms and those for
which a suitable open source Brotli library is available, a Brotli codec is not
bundled and must be separately installed into the jars/3rdparty subdirectory if
you want to work with Parquet files that use Brotli. On Linux and macOS on
amd64, the
[com.github.rdblue:brotli-codec](https://github.com/rdblue/brotli-codec/) is
supported.
+{% include endnote.html %}
+
### Configuring the Size of Parquet Files
Configuring the size of Parquet files by setting the
`store.parquet.block-size` can improve write performance. The block size is the
size of MFS, HDFS, or the file system.