| Issue |
179891
|
| Summary |
[clang-format] Incorrectly formats values using UTF8 characters
|
| Labels |
clang-format
|
| Assignees |
|
| Reporter |
Thalley
|
If I define a `char` or `uint8_t` array using UTF characters (e.g. `𠜎`) then ClangFormat incorrectly formats it by adding an unexpected newline.
Example (when configured for line-length = 100) and tab = 8 spaces:
```c
/* 8 + 65 characters gets formatted to next line */
const uint8_t new_data_a[] = "𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎";
/* 8 + 65 characters does not get formatted to next line */
const uint8_t new_data_b[] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
/* 8 + 87 characters does not get formatted to next line */
const uint8_t new_data_c[] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
```
The line
```c
const uint8_t new_data_a[] = "𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎";
```
gets incorrectly/unexpected formatted as
```c
const uint8_t new_data_a[] =
"𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎";
```
**Logs**
N/A
**System information**
```
clangd version 21.1.6
Features: linux
Platform: x86_64-pc-linux-gnu
```
Editor/LSP plugin:
https://marketplace.visualstudio.com/items?itemName=llvm-vs-code-extensions.vscode-clangd but also happens when running `clang-format` outside of the editor/extension
Operating system:
Linux
Originally created as https://github.com/clangd/clangd/issues/2582
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs