KazydubB commented on a change in pull request #1870: DRILL-7359: Add support
for DICT type in RowSet Framework
URL: https://github.com/apache/drill/pull/1870#discussion_r354275004
##########
File path:
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/reader/UnionReaderImpl.java
##########
@@ -268,4 +274,51 @@ public String getAsString() {
}
return requireReader(type).getAsString();
}
+
+ private UnionReaderImpl getNullReader() {
+ AbstractObjectReader[] nullVariants = new
AbstractObjectReader[variants.length];
+ for (int i = 0; i < variants.length; i++) {
+ nullVariants[i] = variants[i].createNullReader();
+ }
+ return new NullUnionReader(schema(), unionAccessor, nullVariants);
+ }
+
+ private static class NullUnionReader extends UnionReaderImpl {
+
+ private NullUnionReader(ColumnMetadata metadata, VectorAccessor va,
AbstractObjectReader[] variants) {
Review comment:
@paul-rogers, first of all, thank you for your thoughts - they are very
useful.
Regarding the first case you described (I suppose you meant `ScalarReader
kw` and `MapReader vr` in the first code snippet):
one is able to write code with cached readers currently, if needed for such
iterations over key-value pairs in the form you wrote it
```
while (dr.next()) {
System.out.println(kr.getString());
System.out.println(vr.getAsString());
}
```
with the behaviour you expect (without a special `Null*Reader`s). But there
is no need for the case (here and below) to add a special `NULL`-`NULL` entry
past the end, as iteration should be over _true_ collection of entries, which
were added to the `Dict` (through `DictWriter`).
Regarding finding specific values:
there is a `DictReader#getValueReader(Object key)` method (with return type
`ObjectReader`), which allows to obtain `value`'s reader with its position set
to the one corresponding to needed `key`. If there is no one for the `key` -
the special `Null*Reader` (depends on the type of `value`) is returned. So,
this approach enforces non-caching of the value reader (as you have pointed
out). The idea was to make `null` `value`s to be handled the same, indepently
whether the `null` is from the actual value (i.e., from some entry with some
`key` and `null` `value` `"someKey" -> null`) or from absent `key`, e.g. your
example:
```
ObjectReader kr = dr.findStringKey("foo");
if (kr.isNull) {
// kr is a special null reader
} else {
// kr is a Map reader
}
```
is likely should be done, because of the nature of complex types in Drill -
they are generally non-nullable (at least the ones in standard, pre-EVF,
Drill's framework) leading to no distinction between, say, empty or `null`
`MAP`.
But good point about poorer perfomance (with the current approach).
I like your idea with `boolean findKey(...)` - something similar is done in
standard framework as well. Perhaps, I will stick to it. (In this case, your
suggestion about treating entries by the end as if they were `null` does make
sense).
About designated method for each key type - I assume it is redundant, as
`key` type validation is better to be performed once prior to obtaining the
data, and general case of `Object key` should do, shouldn't it?
(Just want to note, that array-like behaviour is useful in testing
primarily, the "real" use-case is to obtain elements by key).
P.s. Sorry, missed the comment yesterday.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services