This is an automated email from the ASF dual-hosted git repository.
ianmcook pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/main by this push:
new 47eb6fb8a5f [Website] Correct statement about compression in FAQ (#541)
47eb6fb8a5f is described below
commit 47eb6fb8a5fa396fe234dfd5756c04100ca67588
Author: Ian Cook <[email protected]>
AuthorDate: Sat Sep 14 11:42:34 2024 -0700
[Website] Correct statement about compression in FAQ (#541)
---
faq.md | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/faq.md b/faq.md
index f5894b2df85..e61340a65ac 100644
--- a/faq.md
+++ b/faq.md
@@ -180,10 +180,12 @@ This efficiency comes at the cost of relatively expensive
reading into memory,
as Parquet data cannot be directly operated on but must be decoded in
large chunks.
-Conversely, Arrow is an in-memory format meant for direct and efficient use
-for computational purposes. Arrow data is not compressed (or only lightly so,
-when using dictionary encoding) but laid out in natural format for the CPU,
-so that data can be accessed at arbitrary places at full speed.
+Conversely, Arrow is an in-memory format meant primarily for direct and
+efficient use for computational purposes. Arrow data is typically not
+compressed but laid out in natural format for the CPU, so that data can be
+accessed at arbitrary places at full speed. (However, Arrow does provide a
+limited set of options for increasing space efficiency, including
+dictionary encoding, run-end encoding, and buffer compression.)
Therefore, Arrow and Parquet complement each other
and are commonly used together in applications. Storing your data on disk