Re: [I] Difference between arrow-go and pyarrow file writer [arrow-go]

via GitHub Fri, 03 Apr 2026 12:07:34 -0700


zeroshade commented on issue #744:
URL: https://github.com/apache/arrow-go/issues/744#issuecomment-4184737408


   So doing a bit of digging, it looks like the only difference I was able to 
find between the version written by arrow-go and the version re-packed by 
pyarrow was a schema name mismatch in the list element. 
   
   the stored `ARROW:schema` metadata uses the element field named "item" which 
is the Arrow convention, which we always rename to "element" in the Parquet 
schema (which is the Parquet spec condition). As a result, the Parquet column 
path has `ops.list.element.id` but the stored ARROW:schema says the element 
name is "item". When repacking it using pyarrow, pyarrow reconstructs the Arrow 
schema from the native Parquet schema and uses "element", making the 
ARROW:schema consistent (by the way, the parquet reader in arrow-go does the 
same thing, ignoring the stored element name). 
   
   So my current theory is that this mismatch is the issue you're seeing in 
snowflake if it is relying on the ARROW:schema field names.  Can you try 
testing out #746 and seeing if that solves the issue you're seeing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Difference between arrow-go and pyarrow file writer [arrow-go]

Reply via email to