Hanghang Liu created GOBBLIN-1949:
-------------------------------------

             Summary: Add option to detect malformed orc during commit
                 Key: GOBBLIN-1949
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1949
             Project: Apache Gobblin
          Issue Type: Bug
            Reporter: Hanghang Liu


Hot fix for malformed ORC file issue.

The issue was observed during compaction that the malformed ORC can’t be 
opened. There're two scenarios of malformed file, one is the file only contains 
the last keyword of Postscript, meaning the byte of "ORC" is written to the 
file. Another situation is the file contains concrete data but doesn't end 
properly so read will fail when ReaderImplextractPostScript().

The fix is to add an validation step of the ORC file during commit, more 
specifically after close the writer and before commit. This can prevent the 
malformed data being moved the output direction and even published to 
destination. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to