TheNeuralBit commented on PR #41769: URL: https://github.com/apache/arrow/pull/41769#issuecomment-2136092470
Thanks very much for the context @jorisvandenbossche and @pitrou. To be clear, de-duping metadata when store_schema is set is the write-side change that needs to wait for a corresponding read side change to have sufficient distribution. How should we handle this particular change (copying schema-level metadata to parquet file-level metadata independent of store_schema flag)? If there's concern over opting everyone in to this I could add another flag in ArrowWriterProperties, as suggested in #31723. It could be a tri-state to maintain backward compatibility: - unset: use value of store_schema - false: never copy schema metadata - true: always copy schema metadata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
