KazydubB edited a comment on issue #1829: DRILL-7096: Develop vector for canonical Map<K,V> URL: https://github.com/apache/drill/pull/1829#issuecomment-516003913 Hi @paul-rogers, thank you for your elaborate comment. This map is intended to be a map in Java's sense, with strict key and map types. The key type was indeed planned to be primitive only while the value could have been of any type either primitive or complex, but I accidentally removed the check when finalizing the changes :(. Just to be clear, key type is not limited to `VARCHAR`, but can be of any other, like `INT`, `BIGINT`, `FLOAT4` etc. This new type is intended to be used for sources which support this type of maps, i.e. Parquet files, Hive tables etc. Support for EVF was not considered because I thought it was not supporting readers which support such map type, but thank you for guidance, I'll look into it. Of course, it should be added. The vector `extends` `RepeatedMapVector` and is essentially a repeated map vector (as you've noted) with constraints regarding its children and type of one of the child, 'key' (it `@Override`s `RepeatedMapVector.Accessor#getObject(int)` as well). This constraint is implemented in `TrueMapVector#putChild(String, ValueVector)` which dissalows the vector to have other children fields. Additionally, new reader and writer was introduced which also `extend` (inherit from) repeated map's reader and writer respectively. The new writer differs from its parent by how it handles children's offsets and introduces two essential methods, `startKeyValuePair()` and `endKeyValuePair()`, to separate entries. And new type's reader adds methods to find index for a given key and read a value for a given key into passed `ValueHolder`. With this approach for writer, writing data into new map vector is not any different than writing data into repeated map (of course, if user (developer) is not trying to access "unknown" field, other than defined `"key"` and `"value"` on the writer for the new type). Flatten, for example, does behave the same for the two vectors. In places these two vectors (`TrueMapVector` and `RepeatedMapVector`) have different behavior, this is done in code; in other case the common behavior (logic) is used. That being said, I do not see a reason to create another `AbstractRepeatedMap` as I think extending `RepeatedMapVector` suffices (unless we really need to separate these two vectors). I am going to add documentation to code to avoid reverse engineering for developers (sorry for making you go through this!) and add unit tests for the new `ValueVector` and its writer and slightly refactor the code to implement aforementioned constraints. Also, I'll rename the type to `DICT`.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
