[ 
https://issues.apache.org/jira/browse/HIVE-16352?focusedWorklogId=474808&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-474808
 ]

ASF GitHub Bot logged work on HIVE-16352:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Aug/20 13:40
            Start Date: 26/Aug/20 13:40
    Worklog Time Spent: 10m 
      Work Description: gabrywu opened a new pull request #1434:
URL: https://github.com/apache/hive/pull/1434


   ### What changes were proposed in this pull request?
   1. add AvroGenericRecordReader.nextRecord
   2. optimize AvroGenericRecordReader.next adding ability to skip invalid sync 
blocks
   3. add enum value AVRO_SERDE_ERROR_SKIP to AvroSerdeUtils.AvroTableProperties
   
   ### Why are the changes needed?
   
   when reading the Avro file which has a bad file format in Hive, we want to 
skip the invalid sync errors simply
   https://issues.apache.org/jira/browse/HIVE-16352
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   NO. The default value of AVRO_SERDE_ERROR_SKIP is false keeping the original 
logic
   
   ### How was this patch tested?
   
   add unit test cases in TestAvroGenericRecordReader.class
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 474808)
    Remaining Estimate: 0h
            Time Spent: 10m

> Ability to skip or repair out of sync blocks with HIVE at runtime
> -----------------------------------------------------------------
>
>                 Key: HIVE-16352
>                 URL: https://issues.apache.org/jira/browse/HIVE-16352
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Navdeep Poonia
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a file is corrupted it raises the error java.io.IOException: Invalid 
> sync! with hive.
>  Can we have some functionality to skip or repair such blocks at runtime to 
> make avro more error resilient in case of data corruption.
>  Error: java.io.IOException: java.io.IOException: java.io.IOException: While 
> processing file 
> s3n://<bucket>/navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_000042.
>  java.io.IOException: Invalid sync!
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to