Aggarwal-Raghav commented on code in PR #5391:
URL: https://github.com/apache/hive/pull/5391#discussion_r1732979618


##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcNewSplit.java:
##########
@@ -101,7 +102,7 @@ public void readFields(DataInput in) throws IOException {
       byte[] tailBuffer = new byte[tailLen];
       in.readFully(tailBuffer);
       OrcProto.FileTail fileTail = OrcProto.FileTail.parseFrom(tailBuffer);
-      orcTail = new OrcTail(fileTail, null);
+      orcTail = new OrcTail(fileTail, new BufferChunk(0, 0), -1);

Review Comment:
   @zhangbutao, In my opinion it is not because of orc version upgrade.
   In tez on yarn flow, this issue is surfaced after orc version upgrade. 
   in tez on llap, i think this issue is since 
[HIVE-15665](https://issues.apache.org/jira/browse/HIVE-15665  as 
https://github.com/apache/hive/blob/d0d5d6d7d11b3eece0d0bc17b429cb30dec5dc79/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L669
 requires serialized footer which with _`hive.orc.splits.include.file.footer 
enabled`_ is null (earlier) and (empty buffer) in this PR, both won't help. 
   
   This code requires actual serialized buffer which I think can only be 
obtained by _`extractFileTail`_ function call. because if we pass empty buffer 
also, when I debugged, it is checking for last byte in the buffer that 
represents postscript length and as in empty buffer case it is 0, so something 
related to **_malformed ORC error is thrown_**



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to