Hanghang Liu created GOBBLIN-1949:
-------------------------------------
Summary: Add option to detect malformed orc during commit
Key: GOBBLIN-1949
URL: https://issues.apache.org/jira/browse/GOBBLIN-1949
Project: Apache Gobblin
Issue Type: Bug
Reporter: Hanghang Liu
Hot fix for malformed ORC file issue.
The issue was observed during compaction that the malformed ORC can’t be
opened. There're two scenarios of malformed file, one is the file only contains
the last keyword of Postscript, meaning the byte of "ORC" is written to the
file. Another situation is the file contains concrete data but doesn't end
properly so read will fail when ReaderImplextractPostScript().
The fix is to add an validation step of the ORC file during commit, more
specifically after close the writer and before commit. This can prevent the
malformed data being moved the output direction and even published to
destination.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)