tustvold commented on code in PR #4399:
URL: https://github.com/apache/arrow-rs/pull/4399#discussion_r1226383362
##########
parquet/src/column/writer/mod.rs:
##########
@@ -1230,24 +1211,19 @@ fn increment(mut data: Vec<u8>) -> Option<Vec<u8>> {
None
}
-/// Try and increment the the string's bytes from right to left, returning
when the result is a valid UTF8 string.
-/// Returns `None` when it can't increment any byte.
+/// Try and increment the the string's bytes from right to left, returning
when the result
+/// is a valid UTF8 string. Returns `None` when it can't increment any byte.
fn increment_utf8(mut data: Vec<u8>) -> Option<Vec<u8>> {
for idx in (0..data.len()).rev() {
let original = data[idx];
- let (mut byte, mut overflow) = data[idx].overflowing_add(1);
-
- // Until overflow: 0xFF -> 0x00
- while !overflow {
Review Comment:
Once a byte is too large to where it no longer constitutes part of a valid
codepoint, continuing to increment it is not going to help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]