I'm writing a msgpack reader and have encountered datasets where an array contains different types for example a VARCHAR and a BINARY. Turns out the BINARY is actually a string. I know this is probably just not modeled correctly in the first place but I'll still going to modify the reading of list so that it takes note of the first element in the list and tries to coerce subsequent elements that are not of the same type.
{ "column": [["name", \\0xAA\\0xBB],["surname", \\0xAA\\0xBB]] } However I have an other scenario where it's actually the field of a map that change type { "column": [ { "dataType": 1, "value": 19 }, { "dataType": 5, "value": "string data" } ] } When reading such a structure a BigInt writer is used to write out the value of the first map but the same BigInt writer is used for value field of the second map. I understand that drill will represent the "value" field in a BitInt vector in memory. My question is how to best address situations like this one. What alternatives is there. Read the value type as ANY? This situation is deeply nested should I put a means to ignore elements at the certain depth? Is it even possible to handle these situations gracefully? Is this a situation where a schema would be helpful in determining what to do with fields that are problematic. Thank you jc