Masayuki Takahashi created PARQUET-1535:
-------------------------------------------
Summary: [parquet-cli] dictionary command throw NPE when specified
column isn't dictionary encoding
Key: PARQUET-1535
URL: https://issues.apache.org/jira/browse/PARQUET-1535
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Affects Versions: 1.11.0
Reporter: Masayuki Takahashi
Attachments: test.parquet
'dictionary' command of parquet-cli throw NPE when specified column isn't
dictionary encoding.
{code}
$ java -cp 'target/classes:target/dependency/*' org.apache.parquet.cli.Main
dictionary /work/parquet-mr/data/test.parquet -c binary_field
Unknown error
java.lang.NullPointerException
at
org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
at org.apache.parquet.cli.Main.run(Main.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.parquet.cli.Main.main(Main.java:177)
{code}
The schema of 'test.parquet' is following:
{code}
$ java -cp 'target/classes:target/dependency/*' org.apache.parquet.cli.Main
meta /work/parquet-mr/data/test.parquet
File path: /work/parquet-mr/data/test.parquet
Created by: parquet-mr version 1.12.0-SNAPSHOT (build
1e62e2e2ca903d4109480bc87ceec1dc954b6c92)
Properties:
writer.model.name: example
Schema:
message test {
required int32 int32_field;
required int64 int64_field;
required float float_field;
required double double_field;
required binary binary_field;
required int64 timestamp_field (TIMESTAMP(MILLIS,true));
}
Row group 0: count: 395 15.87 B records start: 4 total: 6.120 kB
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
int32_field INT32 _ D 395 0.20 B 0 "32" / "426"
int64_field INT64 _ D 395 0.20 B 0 "64" / "458"
float_field FLOAT _ _ 395 4.13 B 0 "1.0" /
"395.0"
double_field DOUBLE _ _ 395 8.13 B 0 "2.0" /
"396.0"
binary_field BINARY _ D 395 2.98 B 0
"0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
timestamp_field INT64 _ D 395 0.23 B 0
"2018-11-04T12:41:15.123+0000" / "2018-11-04T12:47:49.123+0000"
Row group 1: count: 395 15.92 B records start: 6271 total: 6.142 kB
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
int32_field INT32 _ D 395 0.20 B 0 "427" / "821"
int64_field INT64 _ D 395 0.20 B 0 "459" / "853"
float_field FLOAT _ _ 395 4.13 B 0 "396.0" /
"790.0"
double_field DOUBLE _ _ 395 8.13 B 0 "397.0" /
"791.0"
binary_field BINARY _ D 395 3.03 B 0
"0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
timestamp_field INT64 _ D 395 0.23 B 0
"2018-11-04T12:47:50.123+0000" / "2018-11-04T12:54:24.123+0000"
Row group 2: count: 234 16.53 B records start: 12560 total: 3.777 kB
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
int32_field INT32 _ D 234 0.17 B 0 "822" / "1055"
int64_field INT64 _ D 234 0.31 B 0 "854" / "1087"
float_field FLOAT _ _ 234 4.11 B 0 "791.0" /
"1024.0"
double_field DOUBLE _ _ 234 8.21 B 0 "792.0" /
"1025.0"
binary_field BINARY _ D 234 3.38 B 0
"0x6162636465666768696A6B6..." / "0x6162636465666768696A6B6..."
timestamp_field INT64 _ D 234 0.35 B 0
"2018-11-04T12:54:25.123+0000" / "2018-11-04T12:58:18.123+0000"
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)