I'm writing a msgpack reader and have encountered datasets where an array
contains different types for example a VARCHAR and a BINARY. Turns out the
BINARY is actually a string. I know this is probably just not modeled
correctly in the first place but I'll still going to modify the reading of
list so that it takes note of the first element in the list and tries to
coerce subsequent elements that are not of the same type.

{
"column": [["name", \\0xAA\\0xBB],["surname", \\0xAA\\0xBB]]
}

However I have an other scenario where it's actually the field of a map
that change type
{
"column": [
{
"dataType": 1,
"value": 19
},
{
"dataType": 5,
"value": "string data"
}
]
}

When reading such a structure a BigInt writer is used to write out the
value of the first map but the same BigInt writer is used for value field
of the second map. I understand that drill will represent the "value" field
in a BitInt vector in memory.

My question is how to best address situations like this one. What
alternatives is there. Read the value type as ANY? This situation is deeply
nested should I put a means to ignore elements at the certain depth? Is it
even possible to handle these situations gracefully? Is this a situation
where a schema would be helpful in determining what to do with fields that
are problematic.

Thank you
jc

Reply via email to