zeroshade opened a new pull request, #668:
URL: https://github.com/apache/arrow-go/pull/668

   ### Rationale for this change
   Upstream fix for the issue identified in 
https://github.com/apache/iceberg-go/issues/737. When reading maps with nested 
values using column indices for selective column reading, if the child fields 
of the map weren't in the list of indices there was a problem:
   
   - Maps are represented in Parquet as a list of key-value structs 
(`list<struct<key, value>>`
   - The struct *MUST* have exactly 2 fields (key and value) to be converted 
into a proper Arrow typed Map column
   - When applying the column filtering, if only the key *OR* value field (but 
not both) were in the list of columns, the resulting child struct would only 
have 1 field
   - As a result, the `Map.validateData()` method would fail with a panic of 
`arrow/array: map array child array should have two fields`.
   
   ### What changes are included in this PR?
   In pqarrow/file_reader.go leaf filtering is disabled when reading a map's 
key-value struct. This will ensure both the key and value columns are always 
read together, maintaining the required 2-field structure for  map array.
   
   ### Are these changes tested?
   Yes a test case is added for the change.
   
   ### Are there any user-facing changes?
   This only affects map type reading with column filter selection ensuring 
correctness. The only change is that a failure mode has been eliminated.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to