alamb commented on issue #5530: URL: https://github.com/apache/arrow-rs/issues/5530#issuecomment-2052125700
> In parquet, ARROW:SCHEMA is used to identify the schema and extended info in parquet file. I think the trick point is that, without key-value metadata, both view and string are stored as same thing. So I think we can add optimization for "read string as stringview", but I also think maybe storing a stringview as string can make some legacy reader not confused about it. This is a good point. Maybe something we could consider is to support setting the desired type directly on the reader https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ArrowReaderBuilder.html Something like ```rust let reader = ArrowReaderBuilder::try_new(...) // override any schema declared in the file .with_schema(schema) .build()?; ``` If we supported a similar API for the writer than we could write `StringViewArary` without any extra copies but the metadata stored in the file could say "read this as a StringView" 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
