Tongjie Chen created PARQUET-149:
------------------------------------
Summary: provide an option to skip a page in case corrupted bytes
occur
Key: PARQUET-149
URL: https://issues.apache.org/jira/browse/PARQUET-149
Project: Parquet
Issue Type: Improvement
Reporter: Tongjie Chen
In case of hardware failure (disk, memory, etc), there might be corrupted
bytes. That will result in ArrayIndexOutOfBoundException or/and data garbled.
Currently, jobs reading those Parquet files will fail unless the corrupted
files are deleted/moved.
Currently page metadata has a CRC field (not used so far), which can be used to
check integrity of the page. If page data is corrupted, skip the whole page.
related issue: PARQUET-148
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)