Tongjie Chen created PARQUET-149:
------------------------------------

             Summary: provide an option to skip a page in case corrupted bytes 
occur
                 Key: PARQUET-149
                 URL: https://issues.apache.org/jira/browse/PARQUET-149
             Project: Parquet
          Issue Type: Improvement
            Reporter: Tongjie Chen


In case of hardware failure (disk, memory, etc), there might be corrupted 
bytes. That will result in ArrayIndexOutOfBoundException or/and data garbled.

Currently, jobs reading those Parquet files will fail unless the corrupted 
files are deleted/moved.

Currently page metadata has a CRC field (not used so far), which can be used to 
check integrity of the page. If page data is corrupted, skip the whole page.

related issue: PARQUET-148



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to