Hi Paul,

Regarding your point "We can also handle a map projection: `a.b` which
matches:

* A (possibly repeated) map
* A (possibly repeated) DICT with VARCHAR keys
* A UNION (because a union might contain a possibly-repeated map)
* A LIST (because the list can contain a union which might contain a
possibly-repeated map)":

I am not sure why `a.b` is possible for REPEATED MAP - this looks as a
shortcut of some sort. I mean, it looks wrong with respect to data types,
isn't it? Consider an example in Java: `Map<String, Integer>[] a = ...;
Object result = a.get("b");` does not yield array of Integer; let's pretend
the 'Map<String, Integer>' represents a Drill's MAP. But this notation
could have been an alias to some 'function', like `Integer[] array =
collect((Map<String, Integer>) a, "b")`. This does not work for REPEATED
MAP in Drill currently, though such behaviour is present in Hive. (I am not
saying this is wrong to support it for a REPEATED MAP, it may be useful.)

In the case of REPEATED DICT we _may_ choose not to support such
"shortcut", but provide UDFs with needed functionality.

Regarding using keys in filter: I think, it is a good idea to provide UDFs
for such needs. Hive, for example, has following functions for (Hive's) MAP
[1] (see "Collection Functions"):
array<K> map_keys(Map<K.V>)
array<K> map_values(Map<K.V>)


But yes, we must treat projections as general as possible until the real
schema is known and this is a hard task.


[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-OperatorsonComplexTypes

Reply via email to