Rafferty97 commented on PR #20604: URL: https://github.com/apache/datafusion/pull/20604#issuecomment-3994231659
@Omega359 Thanks for the additional context. I was actually surprised to learn that `Utf8View` roundtrips a parquet file, given that it's the same physical representation under the hood. I dug into the parquet reader code and found there are various mechanisms that control how fields are read, including type hints in the metadata and a global `schema_force_view_types` that defaults to true. So, given that both `Utf8` and `Utf8View` materialise into the same physical representation in the parquet files, would a simply solution for your use case be to configure datafusion (or whatever system is reading back these parquet files) to always read in these fields as the same arrow type? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
