[GitHub] [drill] KazydubB commented on a change in pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework

GitBox Thu, 05 Dec 2019 04:09:49 -0800

KazydubB commented on a change in pull request #1870: DRILL-7359: Add support 
for DICT type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354275004


 ##########
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##########
 @@ -268,4 +274,51 @@ public String getAsString() {
     }
     return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+    AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+    for (int i = 0; i < variants.length; i++) {
+      nullVariants[i] = variants[i].createNullReader();
+    }
+    return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+    private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @paul-rogers, first of all, thank you for your thoughts - they are very 
useful.
   
   Regarding the first case you described (I suppose you meant `ScalarReader 
kw` and `MapReader vr` in the first code snippet):
   one is able to write code with cached readers currently, if needed for such 
iterations over key-value pairs in the form you wrote it
   ```
   while (dr.next()) {
     System.out.println(kr.getString());
     System.out.println(vr.getAsString());
   }
   ```
   with the behaviour you expect (without a special `Null*Reader`s). But there 
is no need for the case (here and below) to add a special `NULL`-`NULL` entry 
past the end, as iteration should be over _true_ collection of entries, which 
were added to the `Dict` (through `DictWriter`).
   
   Regarding finding specific values:
   there is a `DictReader#getValueReader(Object key)` method (with return type 
`ObjectReader`), which allows to obtain `value`'s reader with its position set 
to the one corresponding to needed `key`. If there is no one for the `key` - 
the special `Null*Reader` (depends on the type of `value`) is returned. So, 
this approach enforces non-caching of the value reader (as you have pointed 
out). The idea was to make `null` `value`s to be handled the same, indepently 
whether the `null` is from the actual value (i.e., from some entry with some 
`key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your 
example:
   ```
   ObjectReader kr = dr.findStringKey("foo");
   if (kr.isNull) {
     // kr is a special null reader
   } else {
     // kr is a Map reader
   }
   ```
   is likely should be done, because of the nature of complex types in Drill - 
they are generally non-nullable (at least the ones in standard, pre-EVF, 
Drill's framework) leading to no distinction between, say, empty or `null` 
`MAP`.
   
   But good point about poorer perfomance (with the current approach).
   
   I like your idea with `boolean findKey(...)` - something similar is done in 
standard framework as well. Perhaps, I will stick to it. (In this case, your 
suggestion about treating entries by the end as if they were `null` does make 
sense).
   
   About designated method for each key type - I assume it is redundant, as 
`key` type validation is better to be performed once prior to obtaining the 
data, and general case of `Object key` should do, shouldn't it? 
   
   (Just want to note, that array-like behaviour is useful in testing 
primarily, the "real" use-case is to obtain elements by key).
   
   P.s. Sorry, missed the comment yesterday.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [drill] KazydubB commented on a change in pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework

Reply via email to