alamb commented on issue #5530:
URL: https://github.com/apache/arrow-rs/issues/5530#issuecomment-2052125700

   > In parquet, ARROW:SCHEMA is used to identify the schema and extended info 
in parquet file. I think the trick point is that, without key-value metadata, 
both view and string are stored as same thing. So I think we can add 
optimization for "read string as stringview", but I also think maybe storing a 
stringview as string can make some legacy reader not confused about it.
   
   This is a good point. Maybe something we could consider is to support 
setting the desired type directly on the reader
   
https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ArrowReaderBuilder.html
   
   Something like 
   ```rust
   let reader = ArrowReaderBuilder::try_new(...)
     // override any schema declared in the file
     .with_schema(schema)
     .build()?;
   ```
   
   If we supported a similar API for the writer than we could write 
`StringViewArary` without any extra copies but the metadata stored in the file 
could say "read this as a StringView" 🤔 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to