[ https://issues.apache.org/jira/browse/PARQUET-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated PARQUET-2124: ------------------------------------ Labels: pull-request-available (was: ) > Bad DCHECK For Intermixed Dictionary Encoding > --------------------------------------------- > > Key: PARQUET-2124 > URL: https://issues.apache.org/jira/browse/PARQUET-2124 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Reporter: William Butler > Assignee: William Butler > Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Parquet CPP has a DCHECK for a dictionary encoded page coming after a > non-dictionary encoded page. This is bad because the DCHECK can be triggered > by Parquet files that have a column that has a dictionary page, then a > non-dictionary encoded page, then a page of dictionary encoded > values(indices). Fuzzing found such a file. While this could be turned into > an exception, I don't see anything in the Parquet specification that > prohibits such an occurrence of pages. > This situation has brought up on the mailing list > before([https://lists.apache.org/thread/3bzymmbxvmzj12km7cjz1150ndvy9bos)] > and it seems like this is valid but nobody is doing it. > In the PR that added this > check([https://github.com/apache/parquet-cpp/pull/73)] it was noted that the > check is probably not needed. -- This message was sent by Atlassian Jira (v8.20.1#820001)