findepi commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-3158601704
metadata-based types like Arrow extension types is definitely a viable and incrementally executable path. However, the end result is remotely far from optimal: there is official type system represented by `DataType` and there is _actual_ type system represented by `DataType + properties bag`. Eventually, every use of `DataType` will need to be revisited and updated. > For example could one make a UUID type stored as a fixed length binary that works with col = 'abc' as well as col = 'a-b-c'? This is good example, and it's easy to create many others like this. JSON backed by either Utf8 or Binary, what's the equality function? byte-for-byte, or structural? VARIANT backed by Binary, containing serialized data. The equality _must_ deserialize. UUID backed by ... In none of these cases the backing type (Utf8 or Binary) has any meaning and cannot be used for anything. Yet, it's a valid `DataType` so it _can be (erroneously) used for something_, be it an optimizer rule, a function call, a coercion logic, etc. Is this the end state we want to achieve? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org