This is an automated email from the ASF dual-hosted git repository.
gangwu pushed a commit to branch production
in repository https://gitbox.apache.org/repos/asf/parquet-site.git
The following commit(s) were added to refs/heads/production by this push:
new 8f9954c Fix checksumming.md (#35)
8f9954c is described below
commit 8f9954c65de61ae098aacb09a7bca9acfa066f48
Author: Jonah Gao <[email protected]>
AuthorDate: Sun Jan 14 12:58:10 2024 +0800
Fix checksumming.md (#35)
---
content/en/docs/File Format/Data Pages/checksumming.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/content/en/docs/File Format/Data Pages/checksumming.md
b/content/en/docs/File Format/Data Pages/checksumming.md
index 5e02fe1..cca6574 100644
--- a/content/en/docs/File Format/Data Pages/checksumming.md
+++ b/content/en/docs/File Format/Data Pages/checksumming.md
@@ -3,4 +3,4 @@ title: "Checksumming"
linkTitle: "Checksumming"
weight: 7
---
-Column chunks are composed of pages written back to back. The pages share a
common header and readers can skip over page they are not interested in. The
data for the page follows the header and can be compressed and/or encoded. The
compression and encoding is specified in the page metadata.
+Pages of all kinds can be individually checksummed. This allows disabling of
checksums at the HDFS file level, to better support single row lookups.
Checksums are calculated using the standard CRC32 algorithm - as used in e.g.
GZip - on the serialized binary representation of a page (not including the
page header itself).