freakyzoidberg opened a new pull request, #551:
URL: https://github.com/apache/arrow-go/pull/551
Fix spurious `parquet: column chunk cannot have more than one dictionary.`
with specific parquet file
Parquet with
* Arrow Dict column
* Arrow Schema serialied in Parquet Metadata
* ColumnChunks with 1 dict page + at least 2 Data page
When maybeWriteNewDictionary() resets `newDictionary = false` at line 965,
it causes the next call to readDictionary() to try to read the dictionary page
again from the pager, which then calls configureDict() again, which throws the
"cannot have more than one dictionary" error!
The sequence is:
1. Read DICTIONARY_PAGE → newDictionary = true
2. Read DATA_PAGE_1 → calls maybeWriteNewDictionary() → resets
newDictionary = false
3. Read DATA_PAGE_2 → calls readDictionary() → since newDictionary =
false, tries to get dictionary page again → calls configureDict() → ERROR
because decoder already exists
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]