This is an automated email from the ASF dual-hosted git repository.
emkornfield pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/parquet-benchmark.git
The following commit(s) were added to refs/heads/main by this push:
new 213ae16 Add instructions for footer donations
213ae16 is described below
commit 213ae168489d81e737b0da551f9510f281b249a0
Author: Alkis Evlogimenos <[email protected]>
AuthorDate: Wed Aug 28 21:19:33 2024 +0300
Add instructions for footer donations
* Add instructions for footer donations
---
README.md | 30 +++++++++++++++++++++++++++++-
bin/parquet-dump-footer.zip | Bin 0 -> 22503363 bytes
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index ec66430..8aad696 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,29 @@
-# Apache Parquet Benchmarking
+# Parquet benchmark data
+
+This repository contains Parquet benchmark data. Such data is useful to help
+optimize Parquet implementations but also advance the Parquet format itself.
+
+At this point the community requests donation of Parquet footers and especially
+footers that are large and slow to parse/process. Typically these are footers
of
+wide schemata: either coming from lots of individual columns and/or deeply
nested
+structs.
+
+To donate Parquet footers we have built a binary `parquet-dump-footer` as part
+of parquet tools. This utility extracts footers from parquet, scrubs binary
data
+for privacy reasons and allows to pretty print (`--debug`) the result for
+inspection before submission.
+
+When you are ready to donate a footer please open a PR against this repository
+and add your footer under `footer/<name>.footer`.
+
+Use `parquet-dump-footer --help` for explantion of all the options.
+
+## alternate parquet-dump-footer binary
+
+You can find binaries in this repo for different architectures in
+`bin/parquet-dump-footer.zip`. The binaries are built using the following cmake
+configuration.
+
+```sh
+cmake .. -DCMAKE_BUILD_TYPE=Release -DARROW_ACERO=OFF
-DARROW_BUILD_UTILITIES=OFF -DARROW_COMPUTE=OFF -DARROW_CSV=OFF
-DARROW_DATASET=OFF -DARROW_FILESYSTEM=ON -DARROW_AZURE=ON -DARROW_HDFS=OFF
-DARROW_GCS=ON -DARROW_IPC=OFF -DARROW_PARQUET=ON -DARROW_S3=ON
-DARROW_JSON=OFF -DARROW_MIMALLOC=OFF -DARROW_JEMALLOC=OFF
-DARROW_SUBSTRAIT=OFF -DARROW_DEPENDENCY_SOURCE=BUNDLED
-DARROW_DEPENDENCY_USE_SHARED=OFF -DARROW_BUILD_STATIC=ON
-DARROW_BUILD_SHARED=OFF -DPARQUET_BUILD_EXECUTABLES=ON
+```
diff --git a/bin/parquet-dump-footer.zip b/bin/parquet-dump-footer.zip
new file mode 100644
index 0000000..c9116a0
Binary files /dev/null and b/bin/parquet-dump-footer.zip differ