Issue 179891
Summary [clang-format] Incorrectly formats values using UTF8 characters
Labels clang-format
Assignees
Reporter Thalley
    If I define a `char` or `uint8_t` array using UTF characters (e.g. `𠜎`) then ClangFormat incorrectly formats it by adding an unexpected newline. 

Example (when configured for line-length = 100) and tab = 8 spaces:

```c
	/* 8 + 65 characters gets formatted to next line */
	const uint8_t new_data_a[] = "𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎";
	/* 8 + 65 characters does not get formatted to next line */
	const uint8_t new_data_b[] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
	/* 8 + 87 characters does not get formatted to next line */
	const uint8_t new_data_c[] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
```

The line 

```c
	const uint8_t new_data_a[] = "𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎";
```

gets incorrectly/unexpected formatted as 
```c
	const uint8_t new_data_a[] =
		"𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎𠜎";
```

**Logs**

N/A

**System information**

```
clangd version 21.1.6
Features: linux
Platform: x86_64-pc-linux-gnu
```

Editor/LSP plugin:
https://marketplace.visualstudio.com/items?itemName=llvm-vs-code-extensions.vscode-clangd but also happens when running `clang-format` outside of the editor/extension

Operating system:
Linux


Originally created as https://github.com/clangd/clangd/issues/2582
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to