Hey folks, Hopefully this is the right place to ask. As some background I'm Yevgeny Pats <https://www.linkedin.com/in/yevgeny-pats-5973328b/>, Founder @ CloudQuery <https://github.com/cloudquery/cloudquery> . We are very interested in migrating our protocol and Go type system to Apache Arrow. Extensions are a critical part for us and thus I've the following questions on whether it's a usage problem on my end or something that is not yet available. I'll give here an example for Go but I believe the same issue exists in all libraries/languages.
Here is a public github gist <https://gist.github.com/yevgenypats/6969e8e598161fc2021612c780bba3eb>. What are the problems: - The problems are around the abstraction for the extension types. While I understand that the underlying storage needs to be supported in the library we don't have a way for extensions to provide its own builder which means the user needs to know how the extension type stores the type inside the binary. This creates a leaky abstraction and the need for various helper functions like `UUIDToBinary` - The other way is fine as you can have methods like ToUUID on top of the extension array. But this creates asymmetry in the abstraction. - Because we don't control the builder for extensions this cripples into other places like json <https://github.com/apache/arrow/issues/34292#issuecomment-1446653210> and csv where we can't control marshalling (in the same way we control all other built-in types). So basically for extensions that use binary type as underlying storage in case of json and csv those will always be encoded as base64 which is not very useful (think about uuid, ip address, mac address). The main point is that I think the right abstraction for extensions should provide all the apis (type, array, builder) just like built-in types, otherwise the abstraction is incomplete or "leaky". Of course we can still have limitations like the custom builder must use an underlying known storage (for it to work over ipc) but it can still control various other types like marshaling, unmarshaling, building, and so on. Hopefully this gives enough context but would love to elaborate. Thanks, Yevgeny