freakyzoidberg opened a new pull request, #551:
URL: https://github.com/apache/arrow-go/pull/551

   Fix spurious `parquet: column chunk cannot have more than one dictionary.` 
with specific parquet file
   
   Parquet with 
   * Arrow Dict column
   * Arrow Schema serialied in Parquet Metadata
   * ColumnChunks with 1 dict page + at least 2 Data page
   
   When maybeWriteNewDictionary() resets `newDictionary = false` at line 965, 
it causes the next call to readDictionary() to try to read the dictionary page 
again from the pager, which then calls configureDict() again, which throws the 
"cannot have more than one dictionary" error!
   
     The sequence is:
     1. Read DICTIONARY_PAGE → newDictionary = true
     2. Read DATA_PAGE_1 → calls maybeWriteNewDictionary() → resets 
newDictionary = false
     3. Read DATA_PAGE_2 → calls readDictionary() → since newDictionary = 
false, tries to get dictionary page again → calls configureDict() → ERROR 
because decoder already exists
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to