[
https://issues.apache.org/jira/browse/PARQUET-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556828#comment-17556828
]
Gidon Gershinsky commented on PARQUET-2120:
-------------------------------------------
[~shangxinli] and the Parquet community, can you assign this Jira to [~rshkv]
> parquet-cli dictionary command fails on pages without dictionary encoding
> -------------------------------------------------------------------------
>
> Key: PARQUET-2120
> URL: https://issues.apache.org/jira/browse/PARQUET-2120
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cli
> Affects Versions: 1.12.2
> Reporter: Willi Raschkowski
> Priority: Minor
> Fix For: 1.12.3
>
>
> parquet-cli's {{dictionary}} command fails with an NPE if a page does not
> have dictionary encoding:
> {code}
> $ parquet dictionary --column col a-b-c.snappy.parquet
> Unknown error
> java.lang.NullPointerException: Cannot invoke
> "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page"
> is null
> at
> org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
> at org.apache.parquet.cli.Main.run(Main.java:155)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.parquet.cli.Main.main(Main.java:185)
> $ parquet meta a-b-c.snappy.parquet
> ...
> Row group 0: count: 1 46.00 B records start: 4 total: 46 B
> --------------------------------------------------------------------------------
> type encodings count avg size nulls min / max
> col BINARY S _ 1 46.00 B 0 "a" / "a"
> Row group 1: count: 200 0.34 B records start: 50 total: 69 B
> --------------------------------------------------------------------------------
> type encodings count avg size nulls min / max
> col BINARY S _ R 200 0.34 B 0 "b" / "c"
> {code}
> (Note the missing {{R}} / dictionary encoding on that first page.)
> Someone familiar with Parquet might guess from the NPE that there's no
> dictionary encoding. But for files that mix pages with and without dictionary
> encoding (like above), the command will fail before getting to pages that
> actually have dictionaries.
> The problem is that [this
> line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76]
> assumes {{readDictionaryPage}} always returns a page and doesn't handle when
> it does not, i.e. when it returns {{null}}.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)