[kdevelop] [Bug 518035] New: [Kdevelop] Non-1-byte UTF-8 characters will cause semantic token and highlight misalignment in UTF-8 file.

bugzilla_noreply Mon, 23 Mar 2026 01:31:25 -0700

https://bugs.kde.org/show_bug.cgi?id=518035


            Bug ID: 518035
           Summary: [Kdevelop] Non-1-byte UTF-8 characters will cause
                    semantic token and highlight misalignment in UTF-8
                    file.
    Classification: Applications
           Product: kdevelop
      Version First unspecified
       Reported In:
          Platform: Arch Linux
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: All editors
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Created attachment 190930
  --> https://bugs.kde.org/attachment.cgi?id=190930&action=edit
example full  program

SUMMARY
Non-1-byte UTF-8 characters will cause semantic token and highlight
misalignment in UTF-8 file.

STEPS TO REPRODUCE
1. Here is an example program, not only just C++ source files:
#include <iostream>
int main()
{
    using namespace std;
    int loooooooooooooooooooooog = 1;
    // Every 1 byte utf8 character bring 0 misplaced
    cout << "ASCII" << loooooooooooooooooooooog;
    // Every 2 byte utf8 character bring 1 misplaced
    cout << "Normalα" << loooooooooooooooooooooog;
    // Every 3 byte utf8 character bring 2 misplaced
    cout << "Normal中" << loooooooooooooooooooooog;
    // Every 4 byte utf8 character bring 2 misplaced
    cout << "Normal😄" << loooooooooooooooooooooog;
    return 0;
}


OBSERVED RESULT
The begin hightlight position of the "looooooooooooooooog vars" (just for show
the misplaced issue) has been given by "^"
    int loooooooooooooooooooooog = 1;
    // Every 1 byte utf8 character bring 0 misplaced
    cout << "ASCII" << loooooooooooooooooooooog;
                                          ^~~~~~~~~~~~~
    // Every 2 byte utf8 character bring 1 misplaced
    cout << "Normalα" << loooooooooooooooooooooog;
                                                   ^~~~~~~~~~~~~~~
    // Every 3 byte utf8 character bring 2 misplaced
    cout << "Normal中" << loooooooooooooooooooooog;
                                                        ^~~~~~~~~~~~~~~~~~~~
    // Every 4 byte utf8 character bring 2 misplaced
    cout << "Normal😄" << loooooooooooooooooooooog;
                                                         ^~~~~~~~~~~~~~~~~~~


EXPECTED RESULT
It should performs like that:
int loooooooooooooooooooooog = 1;
    cout << "ASCII" << loooooooooooooooooooooog;
                                          ^~~~~~~~~~~~~
    cout << "Normalα" << loooooooooooooooooooooog;
                                                  ^~~~~~~~~~~~~~~
    cout << "Normal中" << loooooooooooooooooooooog;
                                                   ^~~~~~~~~~~~~~~~~~~~
    cout << "Normal😄" << loooooooooooooooooooooog;
                                                    ^~~~~~~~~~~~~~~~~~~


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 
KDE Plasma Version: 6.6.3
KDE Frameworks Version: 6.24.0
Qt Version: 6.10.2

ADDITIONAL INFORMATION
The issue exists in every utf8 files witch has more than 1 byte character and
kdevelop will misplaced the follow characters' highlight and semantic token. I
know that Qtstring uses UTF-16 to encode and I guess
"kdevelop/kdevplatform/language/duchain/" module transmit the position to front
Qt, but there exists some coding conversion issue leading to the misplace.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 518035] New: [Kdevelop] Non-1-byte UTF-8 characters will cause semantic token and highlight misalignment in UTF-8 file.

Reply via email to