[ https://issues.apache.org/jira/browse/PARQUET-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556828#comment-17556828 ]
Gidon Gershinsky commented on PARQUET-2120: ------------------------------------------- [~shangxinli] and the Parquet community, can you assign this Jira to [~rshkv] > parquet-cli dictionary command fails on pages without dictionary encoding > ------------------------------------------------------------------------- > > Key: PARQUET-2120 > URL: https://issues.apache.org/jira/browse/PARQUET-2120 > Project: Parquet > Issue Type: Bug > Components: parquet-cli > Affects Versions: 1.12.2 > Reporter: Willi Raschkowski > Priority: Minor > Fix For: 1.12.3 > > > parquet-cli's {{dictionary}} command fails with an NPE if a page does not > have dictionary encoding: > {code} > $ parquet dictionary --column col a-b-c.snappy.parquet > Unknown error > java.lang.NullPointerException: Cannot invoke > "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" > is null > at > org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78) > at org.apache.parquet.cli.Main.run(Main.java:155) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.parquet.cli.Main.main(Main.java:185) > $ parquet meta a-b-c.snappy.parquet > ... > Row group 0: count: 1 46.00 B records start: 4 total: 46 B > -------------------------------------------------------------------------------- > type encodings count avg size nulls min / max > col BINARY S _ 1 46.00 B 0 "a" / "a" > Row group 1: count: 200 0.34 B records start: 50 total: 69 B > -------------------------------------------------------------------------------- > type encodings count avg size nulls min / max > col BINARY S _ R 200 0.34 B 0 "b" / "c" > {code} > (Note the missing {{R}} / dictionary encoding on that first page.) > Someone familiar with Parquet might guess from the NPE that there's no > dictionary encoding. But for files that mix pages with and without dictionary > encoding (like above), the command will fail before getting to pages that > actually have dictionaries. > The problem is that [this > line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76] > assumes {{readDictionaryPage}} always returns a page and doesn't handle when > it does not, i.e. when it returns {{null}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)