KazydubB commented on issue #1829: DRILL-7096: Develop vector for canonical 
Map<K,V>
URL: https://github.com/apache/drill/pull/1829#issuecomment-516003913
 
 
   Hi @paul-rogers, thank you for your elaborate answer.
   This map is intended to be a map in Java's sense, with strict key and map 
types. The key type was indeed planned to be primitive only while the value 
could have been of any type either primitive or complex, but I accidentally 
removed the check when finalizing the changes :(. Just to be clear, key type is 
not limited to `VARCHAR`, but can be of any other, like `INT`, `BIGINT`, 
`FLOAT4` etc.
   
   This new type is intended to be used for sources which support this type of 
maps, i.e. Parquet files, Hive tables etc.
   Support for EVF was not considered because I thought it was not supporting 
readers which support such map type, but thank you for guidance, I'll look into 
it. Of course, it should be added.
   
   The vector `extends` `RepeatedMapVector` and is essentially a repeated map 
vector (as you've noted) with constraints regarding its children and type of 
one of the child, 'key' (it `@Override`s 
`RepeatedMapVector.Accessor#getObject(int)` as well). This constraint is 
implemented in `TrueMapVector#putChild(String, ValueVector)` which dissalows 
the vector to have other children fields. Additionally, new reader and writer 
was introduced which also `extend` (inherit from) repeated map's reader and 
writer respectively. The new writer differs from its parent by how it handles 
children's offsets and introduces two essential methods, `startKeyValuePair()` 
and `endKeyValuePair()`, to separate entries. And new type's reader adds 
methods to find index for a given key and read a value for a given key into 
passed `ValueHolder`.
   With this approach for writer, writing data into new map vector is not any 
different than writing data into repeated map (of course, if user (developer) 
is not trying to access "unknown" field, other than defined `"key"` and 
`"value"` on the writer for the new type). Flatten, for example, does behave 
the same for the two vectors. In places these two vectors (`TrueMapVector` and 
`RepeatedMapVector`) have different behavior, this is done in code; in other 
case the common behavior (logic) is used.
   That being said, I do not see a reason to create another 
`AbstractRepeatedMap` as I think extending `RepeatedMapVector` suffices (unless 
we really need to separate these two vectors).
   
   I am going to add documentation to code to avoid reverse engineering for 
developers (sorry for making you go through this!) and add unit tests for the 
new `ValueVector` and its writer and slightly refactor the code to implement 
aforementioned constraints. Also, I'll rename the type to `DICT`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to