[ 
https://issues.apache.org/jira/browse/HIVE-29123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-29123:
--------------------------------
    Description: 
In Hive, the {{{}HiveProtoLoggingHook{}}}, and in Tez, the 
{{{}ProtoHistoryLoggingService{}}}, are responsible for logging query execution 
details, query plans, and other runtime statistics into protocol buffer 
(protobuf) files.

These protobuf files are made accessible via EXTERNAL tables and are read using 
the {{{}ProtobufMessageInputFormat{}}}.

However, in cases of abrupt *Application Master termination* or OutOfMemory ** 
errors, these proto files may be left empty or partially written. Attempting to 
query these EXTERNAL tables when such corrupted files are present can lead to 
query failures, typically with an {{{}EOFException{}}}.
{code:java}
Caused by: java.io.EOFException
    at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202)
    at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
    at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
    at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2505)
    at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2637)
    at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
    at 
org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:124)
    at 
org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:84)
    at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
    ... 24 more {code}
 

  was:
HiveProtoLoggingHook in Hive and ProtoHistoryLoggingService in Tez logs query 
execution, query plan, and other runtime statistics in protobuf files. These 
proto files are exposed as EXTERNAL tables, read through 
ProtobufMessageInputFormat.

An abrupt AM kill or OOM event can result in empty or partially written proto 
files. Querying the table with empty/partially written files causes query 
failure with EOFException.
{code:java}
Caused by: java.io.EOFException
    at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202)
    at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
    at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
    at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2505)
    at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2637)
    at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
    at 
org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:124)
    at 
org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:84)
    at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
    ... 24 more {code}
 


> Extend ProtobufInputFormat to handle EOFException for partially written proto 
> files.
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-29123
>                 URL: https://issues.apache.org/jira/browse/HIVE-29123
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Riju Trivedi
>            Assignee: Riju Trivedi
>            Priority: Major
>
> In Hive, the {{{}HiveProtoLoggingHook{}}}, and in Tez, the 
> {{{}ProtoHistoryLoggingService{}}}, are responsible for logging query 
> execution details, query plans, and other runtime statistics into protocol 
> buffer (protobuf) files.
> These protobuf files are made accessible via EXTERNAL tables and are read 
> using the {{{}ProtobufMessageInputFormat{}}}.
> However, in cases of abrupt *Application Master termination* or OutOfMemory 
> ** errors, these proto files may be left empty or partially written. 
> Attempting to query these EXTERNAL tables when such corrupted files are 
> present can lead to query failures, typically with an {{{}EOFException{}}}.
> {code:java}
> Caused by: java.io.EOFException
>     at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202)
>     at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
>     at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
>     at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2505)
>     at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2637)
>     at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
>     at 
> org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:124)
>     at 
> org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:84)
>     at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>     ... 24 more {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to