Issue 86873
Summary Failure to convert branchy code to branchless
Labels missed-optimization
Assignees
Reporter Kmeakin
    https://godbolt.org/z/cn3d5fGs7

Consider these 3 identical functions for computing the length of a UTF8 codepoint from the leading byte:
```rust
#[no_mangle]
fn len_utf8_match(c: u8) -> usize {
    match c {
        0x00..=0x7F => 1,
        0xC0..=0xDF => 2,
        0xE0..=0xEF => 3,
        _ => 4,
 }
}

#[no_mangle]
fn len_utf8_branchless(c: u8) -> usize {
 let mut ret = 1;
    if (c & 0b1100_0000) == 0b1100_0000 {
        ret = 2;
    }
    if (c & 0b1110_0000) == 0b1110_0000 {
        ret = 3;
    }
    if (c & 0b1111_0000) == 0b1111_0000 {
        ret = 4;
    }
    ret
}

#[no_mangle]
fn len_utf8_branchy(c: u8) -> usize {
    if (c & 0b1111_0000) == 0b1111_0000 {
        return 4;
    }
    if (c & 0b1110_0000) == 0b1110_0000 {
        return 3;
    }
    if (c & 0b1100_0000) == 0b1100_0000 {
        return 2;
    }
    1
}
```

For aarch64, `len_utf8_branchless` is the clear winner, for x86_64 and RISCV-64, I think the best results are from `len_utf8_branchless` and `len_utf8_branchy`.

In any case, `len_utf8_branchless` and `len_utf8_branchy` are equivalent, so identical assembly should be produced for both

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to