alamb commented on code in PR #6870:
URL: https://github.com/apache/arrow-rs/pull/6870#discussion_r1887312372
##########
parquet/src/column/writer/mod.rs:
##########
@@ -878,24 +878,44 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a,
E> {
}
}
+ /// Returns `true` if this column's logical type is a UTF-8 string.
+ fn is_utf8(&self) -> bool {
+ self.get_descriptor().logical_type() == Some(LogicalType::String)
+ || self.get_descriptor().converted_type() == ConvertedType::UTF8
+ }
+
fn truncate_min_value(&self, truncation_length: Option<usize>, data:
&[u8]) -> (Vec<u8>, bool) {
truncation_length
.filter(|l| data.len() > *l)
- .and_then(|l| match str::from_utf8(data) {
- Ok(str_data) => truncate_utf8(str_data, l),
- Err(_) => Some(data[..l].to_vec()),
- })
+ .and_then(|l|
+ // don't do extra work if this column isn't UTF-8
+ if self.is_utf8() {
+ match str::from_utf8(data) {
+ Ok(str_data) => truncate_utf8(str_data, l),
+ Err(_) => Some(data[..l].to_vec()),
Review Comment:
> To paraphrase a wise man I know: Every day I wake up. And then I remember
Parquet exists. 🫤
I solace myself with this quote from a former coworker:
> "Legacy Code, n: code that is getting the job done, and pretty well at
that"
Not that we can't / shouldn't improve it of course 🤣
thanks again for all the help here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]