[
https://issues.apache.org/jira/browse/GOBBLIN-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hanghang Liu closed GOBBLIN-1949.
---------------------------------
> Add option to detect malformed orc during commit
> ------------------------------------------------
>
> Key: GOBBLIN-1949
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1949
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Hanghang Liu
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Hot fix for malformed ORC file issue.
> The issue was observed during compaction that the malformed ORC can’t be
> opened. There're two scenarios of malformed file, one is the file only
> contains the last keyword of Postscript, meaning the byte of "ORC" is written
> to the file. Another situation is the file contains concrete data but doesn't
> end properly so read will fail when ReaderImplextractPostScript().
> The fix is to add an validation step of the ORC file during commit, more
> specifically after close the writer and before commit. This can prevent the
> malformed data being moved the output direction and even published to
> destination.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)