This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git


The following commit(s) were added to refs/heads/master by this push:
     new 3d0a35f  Document new Parquet format version and codecs.
3d0a35f is described below

commit 3d0a35f602f02f618deab8592606c1ba3ef8debc
Author: James Turton <[email protected]>
AuthorDate: Wed Feb 23 13:40:04 2022 +0200

    Document new Parquet format version and codecs.
---
 .../en/data-sources-and-file-formats/040-parquet-format.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/_docs/en/data-sources-and-file-formats/040-parquet-format.md 
b/_docs/en/data-sources-and-file-formats/040-parquet-format.md
index 5f1c8e5..3dc1b81 100644
--- a/_docs/en/data-sources-and-file-formats/040-parquet-format.md
+++ b/_docs/en/data-sources-and-file-formats/040-parquet-format.md
@@ -67,6 +67,20 @@ Use the ALTER command to set the `store.format` option.
 
 ``ALTER SYSTEM|SESSION SET `store.format` = 'parquet';``
 
+**Introduced in release:** 1.20.
+
+Optionally, set the Parquet format version.  Parquet v2 introduced new data 
encodings which may affect file size and read/write performance.  Run 
benchmarks with your own data to establish which works best in your environment 
and, if you require interoperable Parquet files, be aware that at the time of 
writing Parquet v1 has much wider support than does v2.
+
+``ALTER SYSTEM|SESSION SET `store.parquet.writer.format_version` = 'v2';``
+
+Also new in Drill 1.20 is an expanded set of compression codec choices as 
listed in the config option description.  These can also have a significant 
impact on file size and read/write performance.  If interoperability is a 
concern, Snappy and gzip codecs have the widest support at the time of writing.
+
+``ALTER SYSTEM|SESSION SET `store.parquet.compression` = 'zstd';``
+
+{% include startnote.html %}
+Because of a mismatch between Drill's set of target platforms and those for 
which a suitable open source Brotli library is available, a Brotli codec is not 
bundled and must be separately installed into the jars/3rdparty subdirectory if 
you want to work with Parquet files that use Brotli.  On Linux and macOS on 
amd64, the 
[com.github.rdblue:brotli-codec](https://github.com/rdblue/brotli-codec/) is 
supported.
+{% include endnote.html %}
+
 ### Configuring the Size of Parquet Files
 Configuring the size of Parquet files by setting the 
`store.parquet.block-size` can improve write performance. The block size is the 
size of MFS, HDFS, or the file system.
 

Reply via email to