[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

ASF GitHub Bot (Jira) Thu, 05 Dec 2019 07:11:25 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988896#comment-16988896
 ]


ASF GitHub Bot commented on DRILL-7359:
---------------------------------------

KazydubB commented on pull request #1870: DRILL-7359: Add support for DICT type 
in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354275004
 
 

 ##########
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
 ##########
 @@ -268,4 +274,51 @@ public String getAsString() {
     }
     return requireReader(type).getAsString();
   }
+
+  private UnionReaderImpl getNullReader() {
+    AbstractObjectReader[] nullVariants = new 
AbstractObjectReader[variants.length];
+    for (int i = 0; i < variants.length; i++) {
+      nullVariants[i] = variants[i].createNullReader();
+    }
+    return new NullUnionReader(schema(), unionAccessor, nullVariants);
+  }
+
+  private static class NullUnionReader extends UnionReaderImpl {
+
+    private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, 
AbstractObjectReader[] variants) {
 
 Review comment:
   @paul-rogers, first of all, thank you for your thoughts - they are very 
useful.
   
   Regarding the first case you described (I suppose you meant `ScalarReader 
kw` and `MapReader vr` in the first code snippet):
   one is able to write code with cached readers currently, if needed for such 
iterations over key-value pairs in the form you wrote it
   ```
   while (dr.next()) {
     System.out.println(kr.getString());
     System.out.println(vr.getAsString());
   }
   ```
   with the behaviour you expect (I need to emphasize, that special 
`Null*Reader`s appear _only_ when finding value `ObjectReader` by certain `key` 
and the given `key` does not exist for the row). But there is no need for the 
case (here and below) to add a special `NULL`-`NULL` entry past the end, as 
iteration should be over _actual_ collection of entries, which were added to 
the `Dict` (through `DictWriter`).
   
   Regarding finding specific values:
   there is a `DictReader#getValueReader(Object key)` method (with return type 
`ObjectReader`), which allows to obtain `value`'s reader with its position set 
to the one corresponding to needed `key`. If there is no one for the `key` - 
the special `Null*Reader` (depends on the type of `value`) is returned. So, 
this approach enforces non-caching of the value reader (as you have pointed 
out). The idea was to make `null` `value`s to be handled the same, indepently 
whether the `null` is from the actual value (i.e., from some entry with some 
`key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your 
example:
   ```
   ObjectReader kr = dr.findStringKey("foo");
   if (kr.isNull) {
     // kr is a special null reader
   } else {
     // kr is a Map reader
   }
   ```
   is likely should be done, because of the nature of complex types in Drill - 
they are generally non-nullable (at least the ones in standard, pre-EVF, 
Drill's framework) leading to no distinction between, say, empty or `null` 
`MAP`. Also, worth noting, that `vr` (defined as `ObjectReader vr = 
dictReader.getValueReader("foo")`) will have correct behavior by itself as 
opposed to `NULL` emulation on `DictReader` level:
   ```
   if (dr.findStringKey("foo")) {
     System.out.println(vr.getAsString());
   }
   ```
   will likely need an `else` case also which should be emulated too (I mean, 
the `ValueReader` itself does not know if it points to `NULL` value within a 
dict as this convention is done on the dict-level; some value reader wrapper 
can be introduced of course, but that is another layer of complexity).
   
   But good point about poorer perfomance (with the current approach).
   
   I like your idea with `boolean findKey(...)` - something similar is done in 
standard framework as well. (In this case, your suggestion about treating 
entries by the end as if they were `null` does make sense). Perhaps, I will 
stick to it after it is clear this is definetely better.
   
   About designated method for each key type - I assume it is redundant, as 
`key` type validation is better to be performed once prior to obtaining the 
data, and general case of `Object key` should do, shouldn't it? 
   
   (Just want to note, that array-like behaviour is useful in testing 
primarily, the "real" use-case is to obtain elements by key).
   
   P.s. Sorry, missed the comment yesterday.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Add support for DICT type in RowSet Framework
> ---------------------------------------------
>
>                 Key: DRILL-7359
>                 URL: https://issues.apache.org/jira/browse/DRILL-7359
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Bohdan Kazydub
>            Assignee: Bohdan Kazydub
>            Priority: Major
>             Fix For: 1.18.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

Reply via email to