abstractdog commented on code in PR #3833:
URL: https://github.com/apache/hive/pull/3833#discussion_r1084204246


##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java:
##########
@@ -224,7 +237,252 @@ private static void skipCompressedIndex(boolean 
isCompressed, PositionProvider i
     index.getNext();
   }
 
-  protected static class StringStreamReader extends StringTreeReader
+  public static class StringDictionaryTreeReaderHive extends TreeReader {

Review Comment:
   I was trying to understand the scenario here and the way I see this: the 
current PR code is not the proper one as we end up Hive on ORC 1.8.x but 
without an important optimization introduced in ORC-1060, so if we have to copy 
some ORC code anyway, let's have ORC-1060 at least here (sometimes I feel we 
need to port changes on separate jiras, but here we can merge them together)
   I see that the basic confusion comes from the fact that in ORC we have a 
common StringTreeReader which encapsulates different kinds of string readers 
like StringDirectTreeReader, StringDictionaryTreeReader, but in hive's 
StringStreamReader we have dictionary-related properties like 
_dictionaryStream, _lengthStream, which is confusing...if we're already 
subclassing ORC tree readers, we should follow it like:
   ```
   HIVE -> ORC
   StringStreamReader -> StringTreeReader (as it is now)
   StringDictionaryStreamReader -> StringDictionaryTreeReader
   StringDirectStreamReader -> StringDirectTreeReader
   ```
   this is a change that should be done regardless of ORC 1.8 upgrade in my 
opinion, and prior to ORC 1.8 upgrade
   once we follow ORC tree class hierarchy, we have a better chance to adapt 
changes like ORC-1060, where e.g. only the dictionary reader has been changed
   
   guys, if you agree with this, let's address the above problem in a separate 
hive ticket first, it's worth spending the time on it, especially if turns out 
that the ORC 1.8 upgrade becomes a clearer thing
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to