Hey Zhonghao!

I'm glad you're getting good use out of the Arrow and Parquet Go
implementations! You're absolutely right, pqarrow does not currently
support Dictionary Arrow arrays. Support for Dictionary Arrays in the Go
Arrow implementation was developed *after* the pqarrow package was created,
so I never got around to implementing the support properly in pqarrow after
it was merged. Mostly, other things took priority and no one had yet
expressed interest in needing that support.

Could you file a Jira ticket for this at
https://issues.apache.org/jira/projects/ARROW/issues/? Make sure you mark
it with the "Go" component and use "[Go]" in the name of the ticket.

If you'd like to take a stab at implementing the support yourself, we
always welcome new contributors and you can tag me (@zeroshade) on the PR
to review it if you do. Otherwise, I'll definitely add it to my list of
things to do, but filing the Jira ticket will make it more visible for
others who may want to contribute to see it as a desired feature and might
be able to get to it before me. At the moment, I can't promise any timeline
on when I'd be able to work on this.

Proper Dictionary Array support for pqarrow has been on my mind as
something to make sure I got to, but always overlooked since no one asked
about it until now. So thank you very much for reaching out!

Take care!
--Matt


On Thu, Oct 20, 2022 at 8:43 AM Zhonghao (Kevin) Yang <yan...@gmail.com>
wrote:

> Hey, Arrow Go Dev:
>
> I was trying to save some arrow tables out to parquet files, with the help
> of the "github.com/apache/arrow/go/v9/parquet/pqarrow" package. btw, it's
> generally a great design (of Arrow) and a great Go implementation.
>
> However, one issue sticks out: in my original arrow Table I have some
> DICTIONARY fields, which pqarrow does NOT currently support.
>
> I would assume supporting them will be quite straightward: just
> "denormalize" the DICTIONARY value into corresponding values (string,
> Timestamp, etc), and it's up to the Parquet to do the right trick (using
> DICTIONARY encoding, etc).
>
> I would have done this conversion on-the-fly by myself, by converting each
> DICTIONARY field into underlying values. However, the arrow table schema is
> dynamic and outside my control, and I need to iterate through fields (maybe
> structs) to locate those) -> it would be much better if pqarrow can support
> this natively.
>
> Can anyone help? thanks!
>
> --
> Zhonghao (Kevin) Yang,  (yan...@gmail.com)
>

Reply via email to