Hello All,
I would like to start moving forward with Map type support and begin
working on implementations. I believe we just need to define the specifics
of the metadata representation before getting started. Previously, there
was a thread [1] that discussed adding Map as a logical type and I'll try
to summarize where we are currently.
Map has been added as a logical type and defined in the Flatbuffer schema
format with 1 field "keysSorted" which indicates if the child keys vector
has been presorted. A Map is a nested type that is represented as
List<entry: Struct<key: K, value: V>>.
I think these are the 2 main issues of the metadata that need to be agreed
upon:
- Same memory layout as List<entry: Struct<key: K, value: V>>. This is so
implementations lacking Map can alias as repeated struct values.
- `Struct` and `K` fields are constrained to be non-nullable, other fields
can be nullable
Here is a sample JSON metadata representation:
{
"name" : "MapName",
"nullable" : true|false,
"type" : {
"name" : "map",
"keysSorted" : true|false
},
"children" : [{
"name" : "entry",
"nullable" : false,
"type" : {
"name" : "struct"
},
"children" : [{
"name" : "key",
"nullable" : false,
"type" : {
"name" : K
},
"children" : []
},{
"name" : "value",
"nullable" : true|false,
"type" : {
"name" : V
},
"children" : []
}]
}]
Any concerns or objections to the above? Hopefully that covers what needs
to be discussed, please correct me if I missed something. Thanks!
Bryan
[1]:
https://lists.apache.org/thread.html/d61f21924159718fb31d27f5c85d58d393a88708f76dff510c8da322@%3Cdev.arrow.apache.org%3E