alamb commented on issue #6058:
URL: https://github.com/apache/arrow-rs/issues/6058#issuecomment-2761665362

   > [@alamb](https://github.com/alamb) 
[@XiangpengHao](https://github.com/XiangpengHao) Is utf8 validation in parquet 
reader necessary? I found a large proportion of 
`parquet::arrow::buffer::offset_buffer::OffsetBuffer<I>::check_valid_utf8` when 
profiling datafusion-comet native scan.
   
   I think it depends on how much you trust your input files to be valid. If 
you trust the files to only contain valid utf8 data, the disabling UTF8 
validation is certainly an option
   
   However, I think disabling this check would be somewhat cheating on 
benchmarks as real systems should be validating all user supplied input for 
safety. 
   
   Here is a ticket describing the 
   - https://github.com/apache/arrow-rs/issues/6701


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to