shyam narayan singh created ARROW-5322:
------------------------------------------
Summary: [C++] [Parquet] Parquet files with dictionary page offset
as 0 is not readable
Key: ARROW-5322
URL: https://issues.apache.org/jira/browse/ARROW-5322
Project: Apache Arrow
Issue Type: Bug
Reporter: shyam narayan singh
There are many parquet files generated in our customers environment that can be
read by Java parquet readers but not C parquet readers or pyarrow.
Reason being Java readers handles "dictionaryPageOffset = 0" to determine if
dictionary page exists where as the C readers uses "has_dictionaryPageOffset"
(_isset bit in thrift message) to determine the same resulting in incompatible
behaviour. This incompatibility is curbing the pyarrow usage in our customers
env.
Making this change makes C parquet readers and pyarrow more usable and
compatible to java parquet readers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)