[ 
https://issues.apache.org/jira/browse/PARQUET-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated PARQUET-2124:
------------------------------------
    Labels: pull-request-available  (was: )

> Bad DCHECK For Intermixed Dictionary Encoding
> ---------------------------------------------
>
>                 Key: PARQUET-2124
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2124
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: William Butler
>            Assignee: William Butler
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Parquet CPP has a DCHECK for a dictionary encoded page coming after a 
> non-dictionary encoded page. This is bad because the DCHECK can be triggered 
> by Parquet files that have a column that has a dictionary page, then a 
> non-dictionary encoded page, then a page of dictionary encoded 
> values(indices). Fuzzing found such a file. While this could be turned into 
> an exception, I don't see anything in the Parquet specification that 
> prohibits such an occurrence of pages.
> This situation has brought up on the mailing list 
> before([https://lists.apache.org/thread/3bzymmbxvmzj12km7cjz1150ndvy9bos)] 
> and it seems like this is valid but nobody is doing it.
> In the PR that added this 
> check([https://github.com/apache/parquet-cpp/pull/73)] it was noted that the 
> check is probably not needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to