Rafferty97 commented on PR #20604:
URL: https://github.com/apache/datafusion/pull/20604#issuecomment-3994231659

   @Omega359 Thanks for the additional context. I was actually surprised to 
learn that `Utf8View` roundtrips a parquet file, given that it's the same 
physical representation under the hood. I dug into the parquet reader code and 
found there are various mechanisms that control how fields are read, including 
type hints in the metadata and a global `schema_force_view_types` that defaults 
to true.
   
   So, given that both `Utf8` and `Utf8View` materialise into the same physical 
representation in the parquet files, would a simply solution for your use case 
be to configure datafusion (or whatever system is reading back these parquet 
files) to always read in these fields as the same arrow type?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to