Aggarwal-Raghav commented on code in PR #5391: URL: https://github.com/apache/hive/pull/5391#discussion_r1732979618
########## ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcNewSplit.java: ########## @@ -101,7 +102,7 @@ public void readFields(DataInput in) throws IOException { byte[] tailBuffer = new byte[tailLen]; in.readFully(tailBuffer); OrcProto.FileTail fileTail = OrcProto.FileTail.parseFrom(tailBuffer); - orcTail = new OrcTail(fileTail, null); + orcTail = new OrcTail(fileTail, new BufferChunk(0, 0), -1); Review Comment: @zhangbutao, In my opinion it is not because of orc version upgrade. In tez on yarn flow, this issue is surfaced after orc version upgrade. in tez on llap, i think this issue is since [HIVE-15665](https://issues.apache.org/jira/browse/HIVE-15665 as https://github.com/apache/hive/blob/d0d5d6d7d11b3eece0d0bc17b429cb30dec5dc79/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L669 requires serialized footer which with _`hive.orc.splits.include.file.footer enabled`_ is null (earlier) and (empty buffer) in this PR, both won't help. This code requires actual serialized buffer which I think can only be obtained by _`extractFileTail`_ function call. because if we pass empty buffer also, when I debugged, it is checking for last byte in the buffer that represents postscript length and as in empty buffer case it is 0, so something related to **_malformed ORC error is thrown_** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org