[
https://issues.apache.org/jira/browse/GOBBLIN-1949?focusedWorklogId=888518&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-888518
]
ASF GitHub Bot logged work on GOBBLIN-1949:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 02/Nov/23 22:11
Start Date: 02/Nov/23 22:11
Worklog Time Spent: 10m
Work Description: Will-Lo merged PR #3818:
URL: https://github.com/apache/gobblin/pull/3818
Issue Time Tracking
-------------------
Worklog Id: (was: 888518)
Time Spent: 0.5h (was: 20m)
> Add option to detect malformed orc during commit
> ------------------------------------------------
>
> Key: GOBBLIN-1949
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1949
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Hanghang Liu
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Hot fix for malformed ORC file issue.
> The issue was observed during compaction that the malformed ORC can’t be
> opened. There're two scenarios of malformed file, one is the file only
> contains the last keyword of Postscript, meaning the byte of "ORC" is written
> to the file. Another situation is the file contains concrete data but doesn't
> end properly so read will fail when ReaderImplextractPostScript().
> The fix is to add an validation step of the ORC file during commit, more
> specifically after close the writer and before commit. This can prevent the
> malformed data being moved the output direction and even published to
> destination.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)