abstractdog commented on code in PR #3833: URL: https://github.com/apache/hive/pull/3833#discussion_r1084204246
########## ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java: ########## @@ -224,7 +237,252 @@ private static void skipCompressedIndex(boolean isCompressed, PositionProvider i index.getNext(); } - protected static class StringStreamReader extends StringTreeReader + public static class StringDictionaryTreeReaderHive extends TreeReader { Review Comment: I was trying to understand the scenario here and the way I see this: the current PR code is not the proper one as we end up Hive on ORC 1.8.x but without an important optimization introduced in ORC-1060, so if we have to copy some ORC code anyway, let's have ORC-1060 at least here (sometimes I feel we need to port changes on separate jiras, but here we can merge them together) I see that the basic confusion comes from the fact that in ORC we have a common StringTreeReader which encapsulates different kinds of string readers like StringDirectTreeReader, StringDictionaryTreeReader, but in hive's StringStreamReader we have dictionary-related properties like _dictionaryStream, _lengthStream, which is confusing...if we're already subclassing ORC tree readers, we should follow it like: ``` HIVE -> ORC StringStreamReader -> StringTreeReader (as it is now) StringDictionaryStreamReader -> StringDictionaryTreeReader StringDirectStreamReader -> StringDirectTreeReader ``` this is a change that should be done regardless of ORC 1.8 upgrade in my opinion, and prior to ORC 1.8 upgrade once we follow ORC tree class hierarchy, we have a better chance to adapt changes like ORC-1060, where e.g. only the dictionary reader has been changed guys, if you agree with this, let's address the above problem in a separate hive ticket first, let's take the time, it's worth the time, especially if turns out that the ORC 1.8 upgrade becomes a clearer thing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org