Hi Yevgeny,

It is great you are thinking of using Arrow.

> - The problems are around the abstraction for the extension types. While I
understand that the underlying storage needs to be supported in the library
we don't have a way for extensions to provide its own builder which means
the user needs to know how the extension type stores the type inside the
binary. This creates a leaky abstraction and the need for various helper
functions like `UUIDToBinary`

I don't have anything specific to offer in terms of the Go implementation.

However, In terms of helping define a better abstraction, one way you might
proceed is to forgo using the library support for extension types and
implement support for your custom types yourself in your application code.
Once you have figured out the most useful APIs, then perhaps you could
propose contributing them to the arrow Go implementation.

Andrew






On Fri, Mar 3, 2023 at 5:54 AM Yevgeny Pats <y...@cloudquery.io> wrote:

> Hey folks,
>
> Hopefully this is the right place to ask. As some background I'm Yevgeny
> Pats <https://www.linkedin.com/in/yevgeny-pats-5973328b/>, Founder @
> CloudQuery <https://github.com/cloudquery/cloudquery> . We are very
> interested in migrating our protocol and Go type system to Apache Arrow.
> Extensions are a critical part for us and thus I've the following questions
> on whether it's a usage problem on my end or something that is not yet
> available. I'll give here an example for Go but I believe the same issue
> exists in all libraries/languages.
>
> Here is a public github gist
> <https://gist.github.com/yevgenypats/6969e8e598161fc2021612c780bba3eb>.
>
> What are the problems:
>
> - The problems are around the abstraction for the extension types. While I
> understand that the underlying storage needs to be supported in the library
> we don't have a way for extensions to provide its own builder which means
> the user needs to know how the extension type stores the type inside the
> binary. This creates a leaky abstraction and the need for various helper
> functions like `UUIDToBinary`
> - The other way is fine as you can have methods like ToUUID on top of the
> extension array. But this creates asymmetry in the abstraction.
> - Because we don't control the builder for extensions this cripples into
> other places like json
> <https://github.com/apache/arrow/issues/34292#issuecomment-1446653210> and
> csv where we can't control marshalling (in the same way we control all
> other built-in types). So basically for extensions that use binary type as
> underlying storage in case of json and csv those will always be encoded as
> base64 which is not very useful (think about uuid, ip address, mac
> address).
>
> The main point is that I think the right abstraction for extensions should
> provide all the apis (type, array, builder) just like built-in types,
> otherwise the abstraction is incomplete or "leaky". Of course we can still
> have limitations like the custom builder must use an underlying known
> storage (for it to work over ipc) but it can still control various other
> types like marshaling, unmarshaling, building, and so on.
>
> Hopefully this gives enough context but would love to elaborate.
>
> Thanks,
> Yevgeny
>

Reply via email to