[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894573#comment-16894573
 ] 

ASF GitHub Bot commented on DRILL-7096:
---------------------------------------

paul-rogers commented on issue #1829: DRILL-7096: Develop vector for canonical 
Map<K,V>
URL: https://github.com/apache/drill/pull/1829#issuecomment-515716535
 
 
   here seems to be little documentation of the design or implementation: I 
find myself having to reverse engineer a design from the code. Would it be 
helpful for other reviewers if we included a bit of explanation?
   
   In particular, what is the structure of the new vector? The true map vector 
seems to have a user-defined key and value type. Does this mean keys can be 
integers or maps? Do the proposed map functions handle non-Varchar keys? Do 
they handle repeated or map keys? If not, should the implementation restrict 
key types?
   
   How do we handle varying value types? (I have a map with, say, {name: 
"Fred", balance: 123}. Would the user specify a Union? Have we fixed the many 
existing problems with the Union type? Or, is the true map meant to be like a 
Java map: Map<K,V>, and not a Python or JSON map with string keys and values of 
any type?
   
   Given the title of this PR, "Develop vector for canonical Map<K,V>", the 
intent seems to be to create a Java-like map. In this case, the name "true" 
map, means "true Java map", not "true JSON map." This may seem like an 
unimportant distinction, but I do point you to [Drill's JSON data 
model](https://drill.apache.org/docs/json-data-model/).
   
   I suppose the the "V" type could be Union to implement a JSON map. Would the 
various functions that work with this map handle union?
   
   Given that this whole exercise simply creates two correlated lists, "key" 
and "value", I do wonder at the need for all the new complexity. Can't we 
simply add functions that work with an existing repeated map where the user 
tells us which column is the key and which is the value? Or, infer the key and 
value if the user uses the names "key" and "value"?
   
   A bit of documentation (a README file, a package-info file or just a Javadoc 
explanation in the vector class) would help answer these questions.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Develop vector for canonical Map<K,V>
> -------------------------------------
>
>                 Key: DRILL-7096
>                 URL: https://issues.apache.org/jira/browse/DRILL-7096
>             Project: Apache Drill
>          Issue Type: Sub-task
>            Reporter: Igor Guzenko
>            Assignee: Bohdan Kazydub
>            Priority: Major
>             Fix For: 1.17.0
>
>
> Canonical Map<K,V> datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct<key:key_type, 
> value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to