[
https://issues.apache.org/jira/browse/ORC-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429115#comment-17429115
]
Dongjoon Hyun commented on ORC-1028:
------------------------------------
I agree with [~Guiyankuang]'s analysis.
The basic sanity check is checking the magic string, "ORC" at the first 3 bytes
and PostScript's ending "ORC".
Usually, the truncated files are detected in this check. The truncated file
recovery is not straightforward.
According to the error messages, those files seem to be corrupted in the middle
too.
In that case, it's harder to recover from those files.
> Orc file damage detection
> -------------------------
>
> Key: ORC-1028
> URL: https://issues.apache.org/jira/browse/ORC-1028
> Project: ORC
> Issue Type: New Feature
> Components: Java
> Reporter: 任建亭
> Priority: Major
>
> On our cluster, we found a lot of corrupted ORC files. How do I quickly
> detect if an ORC file is corrupted? Is there a tool available to repair
> damaged ORC files if they are corrupted
--
This message was sent by Atlassian Jira
(v8.3.4#803005)