Kengo Seki created PARQUET-2325:
-----------------------------------
Summary: Fix parquet-cli's dictionary subcommand to work with
FIXED_LEN_BYTE_ARRAY
Key: PARQUET-2325
URL: https://issues.apache.org/jira/browse/PARQUET-2325
Project: Parquet
Issue Type: Bug
Components: parquet-cli
Reporter: Kengo Seki
Assignee: Kengo Seki
I created a parquet file containing a FIXED_LEN_BYTE_ARRAY column with a
dictionary:
{code}
$ python
Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> tbl = pa.Table.from_arrays([pa.array(["foo", "bar", "baz"],
>>> type=pa.binary(3))], ["col"])
>>> pq.write_table(tbl, use_dictionary=True, where='/tmp/example.parquet')
>>>
$ java -cp 'target/parquet-cli-1.14.0-SNAPSHOT.jar:target/dependency/*'
org.apache.parquet.cli.Main pages /tmp/example.parquet
Column: col
--------------------------------------------------------------------------------
page type enc count avg size size rows nulls min / max
0-D dict S _ 3 3.00 B 9 B
0-1 data S R 3 3.33 B 10 B 0 "0x626172" /
"0x666F6F"
{code}
But the dictionary subcommand doesn't seem to work with that column.
{code}
$ java -cp 'target/parquet-cli-1.14.0-SNAPSHOT.jar:target/dependency/*'
org.apache.parquet.cli.Main dictionary /tmp/example.parquet -c col
Row group 0 dictionary for "col":
Argument error: Unknown dictionary type: FIXED_LEN_BYTE_ARRAY
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)