[
https://issues.apache.org/jira/browse/PARQUET-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646672#comment-17646672
]
ASF GitHub Bot commented on PARQUET-2218:
-----------------------------------------
pitrou opened a new pull request, #188:
URL: https://github.com/apache/parquet-format/pull/188
When trying to implement CRC computation in Parquet C++, we found the
wording to be ambiguous.
Clarify that CRC computation happens on the exact binary serialization
(instead of a long-winded and confusing elaboration about v1 and v2 data page
layout).
Also, clarify that CRC computation can apply to all page kinds, not only
data pages (for reference, parquet-mr currently support checksumming v1 data
pages as well as dictionary pages).
Also, see discussion on
https://github.com/apache/parquet-format/pull/126#issuecomment-1348081137 and
below.
> [Format] Clarify CRC computation
> --------------------------------
>
> Key: PARQUET-2218
> URL: https://issues.apache.org/jira/browse/PARQUET-2218
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-format
> Reporter: Antoine Pitrou
> Assignee: Antoine Pitrou
> Priority: Minor
> Fix For: format-2.10.0
>
>
> The format spec on CRC checksumming felt ambiguous when trying to implement
> it in Parquet C++, so we should make the wording clearer.
> (see discussion on
> https://github.com/apache/parquet-format/pull/126#issuecomment-1348081137 and
> below)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)