Aggarwal-Raghav opened a new pull request, #5033: URL: https://github.com/apache/hive/pull/5033
### What changes were proposed in this pull request? [HIVE-28026](https://issues.apache.org/jira/browse/HIVE-28026) ### Why are the changes needed? **Query**: select * from <table_name> **Explanation**: On running the above mentioned query on a hive proto table, multiple tez containers will be spawned to process the data. In a container, if there are multiple hdfs splits and the combined size of decompressed data is more than 2GB then the query fails with the following error: `"While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either that the input has been truncated or that an embedded message misreported its own length."` This is happening because of [CodedInputStream](https://github.com/protocolbuffers/protobuf/blob/54489e95e01882407f356f83c9074415e561db00/java/core/src/main/java/com/google/protobuf/CodedInputStream.java#L2712C7-L2712C16) i.e. _byteLimit += totalBytesRetired + pos;_ _byteLimit_ is getting InterOverflow as _totalBytesRetired_ is retaining all the bytes that it has read as CodedInputStream is initiliazed once for a container. https://github.com/apache/hive/blob/564d7e54d2360488611da39d0e5f027a2d574fc1/ql/src/java/org/apache/tez/dag/history/logging/proto/ProtoMessageWritable.java#L96 This is different from issue reproduced in https://github.com/zabetak/protobuf-large-message as there it is a single proto data file more than 2GB, but in my case, there are multiple file total resulting in 2GB. **Limitation**: This fix will still not resolve the issue which is mentioned https://github.com/protocolbuffers/protobuf/issues/11729 ### Does this PR introduce _any_ user-facing change? NO ### Is the change a dependency upgrade? NO ### How was this patch tested? On a cluster -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
