Willi Raschkowski created PARQUET-2120:
------------------------------------------

             Summary: parquet-cli dictionary fails on pages without dictionary 
encoding
                 Key: PARQUET-2120
                 URL: https://issues.apache.org/jira/browse/PARQUET-2120
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cli
    Affects Versions: 1.12.2
            Reporter: Willi Raschkowski


parquet-cli's {{dictionary}} command fails with an NPE if a page does not have 
dictionary encoding:

{code}
$ parquet dictionary --column col a-b-c.snappy.parquet                
Unknown error
java.lang.NullPointerException: Cannot invoke 
"org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" is 
null
        at 
org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
        at org.apache.parquet.cli.Main.run(Main.java:155)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.parquet.cli.Main.main(Main.java:185)

$ parquet meta a-b-c.snappy.parquet      
...
Row group 0:  count: 1  46.00 B records  start: 4  total: 46 B
--------------------------------------------------------------------------------
     type      encodings count     avg size   nulls   min / max
col  BINARY    S   _     1         46.00 B    0       "a" / "a"

Row group 1:  count: 200  0.34 B records  start: 50  total: 69 B
--------------------------------------------------------------------------------
     type      encodings count     avg size   nulls   min / max
col  BINARY    S _ R     200       0.34 B     0       "b" / "c"
{code}
(Note the missing {{R}} / dictionary encoding on that first page.)

The problem is that [this 
line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76]
 assumes {{readDictionaryPage}} always returns a page and doesn't handle when 
it does not, i.e. when it returns {{null}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to