Masayuki Takahashi created PARQUET-1535:
-------------------------------------------

             Summary: [parquet-cli] dictionary command throw NPE when specified 
column isn't dictionary encoding
                 Key: PARQUET-1535
                 URL: https://issues.apache.org/jira/browse/PARQUET-1535
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.11.0
            Reporter: Masayuki Takahashi
         Attachments: test.parquet

'dictionary' command of parquet-cli throw NPE when specified column isn't 
dictionary encoding.
{code}
$ java -cp 'target/classes:target/dependency/*' org.apache.parquet.cli.Main 
dictionary /work/parquet-mr/data/test.parquet -c binary_field
Unknown error
java.lang.NullPointerException
        at 
org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
        at org.apache.parquet.cli.Main.run(Main.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.parquet.cli.Main.main(Main.java:177)
{code}

The schema of 'test.parquet' is following:
{code}
$ java -cp 'target/classes:target/dependency/*' org.apache.parquet.cli.Main 
meta /work/parquet-mr/data/test.parquet

File path:  /work/parquet-mr/data/test.parquet
Created by: parquet-mr version 1.12.0-SNAPSHOT (build 
1e62e2e2ca903d4109480bc87ceec1dc954b6c92)
Properties:
  writer.model.name: example
Schema:
message test {
  required int32 int32_field;
  required int64 int64_field;
  required float float_field;
  required double double_field;
  required binary binary_field;
  required int64 timestamp_field (TIMESTAMP(MILLIS,true));
}


Row group 0:  count: 395  15.87 B records  start: 4  total: 6.120 kB
--------------------------------------------------------------------------------
                 type      encodings count     avg size   nulls   min / max
int32_field      INT32     _   D     395       0.20 B     0       "32" / "426"
int64_field      INT64     _   D     395       0.20 B     0       "64" / "458"
float_field      FLOAT     _   _     395       4.13 B     0       "1.0" / 
"395.0"
double_field     DOUBLE    _   _     395       8.13 B     0       "2.0" / 
"396.0"
binary_field     BINARY    _   D     395       2.98 B     0       
"0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
timestamp_field  INT64     _   D     395       0.23 B     0       
"2018-11-04T12:41:15.123+0000" / "2018-11-04T12:47:49.123+0000"

Row group 1:  count: 395  15.92 B records  start: 6271  total: 6.142 kB
--------------------------------------------------------------------------------
                 type      encodings count     avg size   nulls   min / max
int32_field      INT32     _   D     395       0.20 B     0       "427" / "821"
int64_field      INT64     _   D     395       0.20 B     0       "459" / "853"
float_field      FLOAT     _   _     395       4.13 B     0       "396.0" / 
"790.0"
double_field     DOUBLE    _   _     395       8.13 B     0       "397.0" / 
"791.0"
binary_field     BINARY    _   D     395       3.03 B     0       
"0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
timestamp_field  INT64     _   D     395       0.23 B     0       
"2018-11-04T12:47:50.123+0000" / "2018-11-04T12:54:24.123+0000"

Row group 2:  count: 234  16.53 B records  start: 12560  total: 3.777 kB
--------------------------------------------------------------------------------
                 type      encodings count     avg size   nulls   min / max
int32_field      INT32     _   D     234       0.17 B     0       "822" / "1055"
int64_field      INT64     _   D     234       0.31 B     0       "854" / "1087"
float_field      FLOAT     _   _     234       4.11 B     0       "791.0" / 
"1024.0"
double_field     DOUBLE    _   _     234       8.21 B     0       "792.0" / 
"1025.0"
binary_field     BINARY    _   D     234       3.38 B     0       
"0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
timestamp_field  INT64     _   D     234       0.35 B     0       
"2018-11-04T12:54:25.123+0000" / "2018-11-04T12:58:18.123+0000"
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to