[ 
https://issues.apache.org/jira/browse/GOBBLIN-1949?focusedWorklogId=888516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-888516
 ]

ASF GitHub Bot logged work on GOBBLIN-1949:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Nov/23 21:44
            Start Date: 02/Nov/23 21:44
    Worklog Time Spent: 10m 
      Work Description: hanghangliu commented on PR #3818:
URL: https://github.com/apache/gobblin/pull/3818#issuecomment-1791581268

   For future work, we can consider clean up the task output dir every time 
when container start up to avoid reading files from failed container dir. It 
would be better on performance wise rather than read-after-write like this 
patch.




Issue Time Tracking
-------------------

            Worklog Id:     (was: 888516)
    Remaining Estimate: 0h
            Time Spent: 10m

> Add option to detect malformed orc during commit
> ------------------------------------------------
>
>                 Key: GOBBLIN-1949
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1949
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Hanghang Liu
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hot fix for malformed ORC file issue.
> The issue was observed during compaction that the malformed ORC can’t be 
> opened. There're two scenarios of malformed file, one is the file only 
> contains the last keyword of Postscript, meaning the byte of "ORC" is written 
> to the file. Another situation is the file contains concrete data but doesn't 
> end properly so read will fail when ReaderImplextractPostScript().
> The fix is to add an validation step of the ORC file during commit, more 
> specifically after close the writer and before commit. This can prevent the 
> malformed data being moved the output direction and even published to 
> destination. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to