Kengo Seki created PARQUET-2325:
-----------------------------------

             Summary: Fix parquet-cli's dictionary subcommand to work with 
FIXED_LEN_BYTE_ARRAY
                 Key: PARQUET-2325
                 URL: https://issues.apache.org/jira/browse/PARQUET-2325
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cli
            Reporter: Kengo Seki
            Assignee: Kengo Seki


I created a parquet file containing a FIXED_LEN_BYTE_ARRAY column with a 
dictionary:

{code}
$ python
Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> tbl = pa.Table.from_arrays([pa.array(["foo", "bar", "baz"], 
>>> type=pa.binary(3))], ["col"])
>>> pq.write_table(tbl, use_dictionary=True, where='/tmp/example.parquet')
>>> 
$ java -cp 'target/parquet-cli-1.14.0-SNAPSHOT.jar:target/dependency/*' 
org.apache.parquet.cli.Main pages /tmp/example.parquet

Column: col
--------------------------------------------------------------------------------
  page   type  enc  count   avg size   size       rows     nulls   min / max
  0-D    dict  S _  3       3.00 B     9 B       
  0-1    data  S R  3       3.33 B     10 B                0       "0x626172" / 
"0x666F6F"
{code}

But the dictionary subcommand doesn't seem to work with that column.

{code}
$ java -cp 'target/parquet-cli-1.14.0-SNAPSHOT.jar:target/dependency/*' 
org.apache.parquet.cli.Main dictionary /tmp/example.parquet -c col

Row group 0 dictionary for "col":
Argument error: Unknown dictionary type: FIXED_LEN_BYTE_ARRAY
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to