[
https://issues.apache.org/jira/browse/PARQUET-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou resolved PARQUET-2124.
-------------------------------------
Fix Version/s: cpp-8.0.0
Resolution: Fixed
Issue resolved by pull request 12427
[https://github.com/apache/arrow/pull/12427]
> Bad DCHECK For Intermixed Dictionary Encoding
> ---------------------------------------------
>
> Key: PARQUET-2124
> URL: https://issues.apache.org/jira/browse/PARQUET-2124
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Reporter: William Butler
> Assignee: William Butler
> Priority: Minor
> Labels: pull-request-available
> Fix For: cpp-8.0.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Parquet CPP has a DCHECK for a dictionary encoded page coming after a
> non-dictionary encoded page. This is bad because the DCHECK can be triggered
> by Parquet files that have a column that has a dictionary page, then a
> non-dictionary encoded page, then a page of dictionary encoded
> values(indices). Fuzzing found such a file. While this could be turned into
> an exception, I don't see anything in the Parquet specification that
> prohibits such an occurrence of pages.
> This situation has brought up on the mailing list
> before([https://lists.apache.org/thread/3bzymmbxvmzj12km7cjz1150ndvy9bos)]
> and it seems like this is valid but nobody is doing it.
> In the PR that added this
> check([https://github.com/apache/parquet-cpp/pull/73)] it was noted that the
> check is probably not needed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)