zeroshade commented on issue #744: URL: https://github.com/apache/arrow-go/issues/744#issuecomment-4184737408
So doing a bit of digging, it looks like the only difference I was able to find between the version written by arrow-go and the version re-packed by pyarrow was a schema name mismatch in the list element. the stored `ARROW:schema` metadata uses the element field named "item" which is the Arrow convention, which we always rename to "element" in the Parquet schema (which is the Parquet spec condition). As a result, the Parquet column path has `ops.list.element.id` but the stored ARROW:schema says the element name is "item". When repacking it using pyarrow, pyarrow reconstructs the Arrow schema from the native Parquet schema and uses "element", making the ARROW:schema consistent (by the way, the parquet reader in arrow-go does the same thing, ignoring the stored element name). So my current theory is that this mismatch is the issue you're seeing in snowflake if it is relying on the ARROW:schema field names. Can you try testing out #746 and seeing if that solves the issue you're seeing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
