[
https://issues.apache.org/jira/browse/ARROW-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662343#comment-17662343
]
Rok Mihevc commented on ARROW-5322:
-----------------------------------
This issue has been migrated to [issue
#21784|https://github.com/apache/arrow/issues/21784] on GitHub. Please see the
[migration documentation|https://github.com/apache/arrow/issues/14542] for
further details.
> [C++] [Parquet] Parquet files with dictionary page offset as 0 is not
> readable
> -------------------------------------------------------------------------------
>
> Key: ARROW-5322
> URL: https://issues.apache.org/jira/browse/ARROW-5322
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: shyam narayan singh
> Priority: Major
> Labels: parquet, pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> There are many parquet files generated in our customers environment that can
> be read by Java parquet readers but not C parquet readers or pyarrow.
> Reason being Java readers handles "dictionaryPageOffset = 0" to determine if
> dictionary page exists where as the C readers uses "has_dictionaryPageOffset"
> (_isset bit in thrift message) to determine the same resulting in
> incompatible behaviour. This incompatibility is curbing the pyarrow usage in
> our customers env.
> Making this change makes C parquet readers and pyarrow more usable and
> compatible to java parquet readers.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)