[ https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989053#comment-16989053 ]
ASF GitHub Bot commented on DRILL-7359: --------------------------------------- paul-rogers commented on pull request #1870: DRILL-7359: Add support for DICT type in RowSet Framework URL: https://github.com/apache/drill/pull/1870#discussion_r353522274 ########## File path: exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java ########## @@ -268,4 +274,51 @@ public String getAsString() { } return requireReader(type).getAsString(); } + + private UnionReaderImpl getNullReader() { + AbstractObjectReader[] nullVariants = new AbstractObjectReader[variants.length]; + for (int i = 0; i < variants.length; i++) { + nullVariants[i] = variants[i].createNullReader(); + } + return new NullUnionReader(schema(), unionAccessor, nullVariants); + } + + private static class NullUnionReader extends UnionReaderImpl { + + private NullUnionReader(ColumnMetadata metadata, VectorAccessor va, AbstractObjectReader[] variants) { Review comment: @KazydubB, thanks for the explanation. It clarifies something I wondered about: the role of the null state readers. Let's talk a bit about the intended semantics for the readers. I should be able to write code that caches the readers: ``` DictReader dr = rowReader.dict("myDict"); ScalarReader kw = dr.key().scalar(); MapReader vr = dr.value.tuple(); ``` That is, I should be able to use the same key reader (`kr`) and value reader (`vr`) for all members. Then, I should either be able to iterate over the values: ``` while (dr.next()) { System.out.println(kr.getString()); System.out.println(vr.getAsString()); } ``` If things work that way, then we don't need a special reader for the null state: the value reader (here, `vr`) would simply return `true` from `isNull()`: ``` while (dr.next()) { System.out.println(kr.getString()); if (vr.isNull()) { System.out.println("NULL"); } else { System.out.println(vr.getAsString()); } } ``` (Ignoring, for the moment, that `getAsString()` already does a null check.) Note that, if we read past the end, both the key and value writers will point to nothing. I think that, for the array reader, accessing the value one past the end is an error. For `DICT`, we could simply declare that the value is NULL. Then, I was going to sketch out the same behavior if we search for a key and found...nothing. Seems that the `DictReader` has no way to search for a key. If it did, it would not be clear what type to use for the key. So, that is a bit of a mess. Let's assume we did have something to find a specific value, like we do for the arrays where we have `setPosn(int index)`. Maybe we have `findStringKey()` etc. For testing, we could just have `findKey(Object key)` and do a switch on type as we do for the other readers. So: ``` dr.findStringKey("foo"); if (vr.isNull()) { System.out.println("NULL"); } else { System.out.println(vr.getAsString()); } ``` Or, even: ``` if (dr.findStringKey("foo")) { System.out.println(vr.getAsString()); } ``` The only way to find the value is to do a linear search. If we do a linear search, and no value exists, the index will point one past the end of the dict. And, above, we suggested that the key and value readers could return `true` for `isNull()` in those cases. Given this, unless I'm missing something, I don't see that we need any kind of null value reader at all. In fact, I'd even say that we don't want such a reader if we'd have to do: ``` ObjectReader kr = dr.findStringKey("foo"); if (kr.isNull) { // kr is a special null reader } else { // kr is a Map reader } ``` The above clutters the code and makes it impossible to cache the value reader, which will slow inner-most loops that want to use the reader abstractions. What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for DICT type in RowSet Framework > --------------------------------------------- > > Key: DRILL-7359 > URL: https://issues.apache.org/jira/browse/DRILL-7359 > Project: Apache Drill > Issue Type: New Feature > Reporter: Bohdan Kazydub > Assignee: Bohdan Kazydub > Priority: Major > Fix For: 1.18.0 > > > Add support for new DICT data type (see DRILL-7096) in RowSet Framework -- This message was sent by Atlassian Jira (v8.3.4#803005)