Hello, This cannot be considered a bug. Because Global does not support multi-byte character code set.
[/usr/local/share/gtags/FAQ] -------------------------------------------------------------- Q10. Does Global support multi-byte code set? Which character code set is supported? A10. Global doesn't support multi-byte character code set yet. Global supports only ASCII and ASCII super-sets. -------------------------------------------------------------- Shift-JIS "機能" consists of the following characters: 0x22 " 0x8b (binary) 0x40 @ 0x94 (binary) 0x5c \ 0x22 " Since 0x5c ('\') quotes 0x22 ('"'), the parser considers the rest of the source code as a long string. It is impossible to recognize it as a failure because it is a correct process. Regards, Shigio On Fri, Nov 17, 2023 at 11:46 AM Johnny Cheng <itainan....@gmail.com> wrote: > Hi, > > I found that if a file contains a specific CJK characters sequence, the > parser seems fail to continue parsing the file. > > See the follow example source file, let’s say `test.c` in encoding of > Shift-JIS (cp932). > > extern void printf(char * msg, ...); > > > > void Foo() { > > char msg[] = "機能"; > > printf(msg); > > } > > > > void Hello() { > > return; > > } > > (In case of mojibake due to encoding issue for Kanji, screenshots are also > provided below.) > > - *What was occurred? (as is)* > > Now if you run `gtags` command in same folder follow by `global -f > test.c`, you only get one tag, which is `Foo`, but `Hello` shall also be > found. > > - *What did you expect from it?* > > However, if I modify the source a little bit, then tag `Hello` is found. > See variations I tried in the table below. > > > *Cases Table* > > Cases > > Source Code Screenshot > > global -f test.c > > Bad Case > > [image: image001.png] > > (Encoding is cp932, or shift-jis) > > Foo 4 test.cpp void Foo() { > > Good Cases > > <image001.png> > > (Encoding is utf8) > > > > [image: image002.png] > > (Encoding is cp932, or shift-jis) > > > > [image: image003.png] > > (Encoding is cp932, or shift-jis) > > Foo 4 test.cpp void Foo() { > > Hello 9 test.cpp void Hello() { > > > *My environment* > > OS > > Windows 11 Enterprise 22H2 64bit Build 22621.2428 > > gtags --version > > gtags (Global) 6.6.9 > > Powered by Berkeley DB 1.85. > > Copyright (c) 1996-2022 Tama Communications Corporation > > License GPLv3+: GNU GPL version 3 or later > http://www.gnu.org/licenses/gpl.html > > This is free software; you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. > > > *Possible Solutions* > > - Add a command line encoding option to read the file properly. > - Find out why such file cannot be fully parsed, ignore such special > error, and continue parsing. > > Also, if such case happens, at least print out some error message to > inform user that some files are not fully parsed. > > > > > > Johnny Cheng > > -- Shigio YAMAGUCHI <shi...@gnu.org> PGP fingerprint: 26F6 31B4 3D62 4A92 7E6F 1C33 969C 3BE3 89DD A6EB