Kevin Yang created ARROW-18288:
----------------------------------
Summary: [GO]: pqarrow
(github.com/apache/arrow/go/v9/parquet/pqarrow) cannot handle arrow's
DICTIONARY field
Key: ARROW-18288
URL: https://issues.apache.org/jira/browse/ARROW-18288
Project: Apache Arrow
Issue Type: Bug
Components: Go
Affects Versions: 10.0.0, 9.0.0
Reporter: Kevin Yang
Hey, Arrow Go Dev:
I was trying to save some arrow tables out to parquet files, with the help of
the
"[github.com/apache/arrow/go/v9/parquet/pqarrow|http://github.com/apache/arrow/go/v9/parquet/pqarrow]"
package. btw, it's generally a great design (of Arrow) and a great Go
implementation.
However, one issue sticks out: in my original arrow Table I have some
DICTIONARY fields, which pqarrow does NOT currently support.
I would assume supporting them will be quite straightward: just "denormalize"
the DICTIONARY value into corresponding values (string, Timestamp, etc), and
it's up to the Parquet to do the right trick (using DICTIONARY encoding, etc).
I would have done this conversion on-the-fly by myself, by converting each
DICTIONARY field into underlying values. However, the arrow table schema is
dynamic and outside my control, and I need to iterate through fields (maybe
structs) to locate those) -> it would be much better if pqarrow can support
this natively.
Can anyone help? thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)