[
https://issues.apache.org/jira/browse/PARQUET-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255407#comment-15255407
]
Kai Zheng commented on PARQUET-594:
-----------------------------------
Thanks Wes, I see. It assumes the parquet format and definition context where I
also found:
{quote}
Checksumming
Data pages can be individually checksummed. This allows disabling of checksums
at the HDFS file level, to better support single row lookups.
{quote}
Not clear what it meant by disabling HDFS file level checksuming. HDFS
file/block level checksuming isn't allowed to be disabled. Sure it can skip
verifying checksum while reading.
By doing individually checksuming in pages, how it help to better support
single row lookups? Does it mean to skip HDFS file/block level checksum
verifying by just verifying only some pages checksums? If so, sounds good.
Ref. the mentioned line:
{code}
/** 32bit crc for the data below. This allows for disabling checksumming in HDFS
* if only a few pages needs to be read
**/
optional i32 crc
{code}
If convenient, I suggest {{crc}} be renamed to {{crc32}} by conventions.
Thanks for clarifying my questions.
> Support CRC checksums in pages
> ------------------------------
>
> Key: PARQUET-594
> URL: https://issues.apache.org/jira/browse/PARQUET-594
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: Uwe L. Korn
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)