Hello,
This cannot be considered a bug. Because Global does not
support multi-byte character code set.

[/usr/local/share/gtags/FAQ]
--------------------------------------------------------------
Q10. Does Global support multi-byte code set?
     Which character code set is supported?

A10. Global doesn't support multi-byte character code set yet.
     Global supports only ASCII and ASCII super-sets.
--------------------------------------------------------------

Shift-JIS "機能" consists of the following characters:

0x22    "
0x8b    (binary)
0x40    @
0x94    (binary)
0x5c    \
0x22    "

Since 0x5c ('\') quotes 0x22 ('"'), the parser considers the rest of
the source code as a long string. It is impossible to recognize it
as a failure because it is a correct process.

Regards,
Shigio

On Fri, Nov 17, 2023 at 11:46 AM Johnny Cheng <itainan....@gmail.com> wrote:

> Hi,
>
> I found that if a file contains a specific CJK characters sequence, the
> parser seems fail to continue parsing the file.
>
> See the follow example source file, let’s say `test.c` in encoding of
> Shift-JIS (cp932).
>
> extern void printf(char * msg, ...);
>
>
>
> void Foo() {
>
>     char msg[] = "機能";
>
>     printf(msg);
>
> }
>
>
>
> void Hello() {
>
>     return;
>
> }
>
> (In case of mojibake due to encoding issue for Kanji, screenshots are also
> provided below.)
>
>    - *What was occurred? (as is)*
>
> Now if you run `gtags` command in same folder follow by `global -f
> test.c`, you only get one tag, which is `Foo`, but `Hello` shall also be
> found.
>
>    - *What did you expect from it?*
>
> However, if I modify the source a little bit, then tag `Hello` is found.
> See variations I tried in the table below.
>
>
> *Cases Table*
>
> Cases
>
> Source Code Screenshot
>
> global -f test.c
>
> Bad Case
>
> [image: image001.png]
>
> (Encoding is cp932, or shift-jis)
>
> Foo                 4 test.cpp         void Foo() {
>
> Good Cases
>
> <image001.png>
>
> (Encoding is utf8)
>
>
>
> [image: image002.png]
>
> (Encoding is cp932, or shift-jis)
>
>
>
> [image: image003.png]
>
> (Encoding is cp932, or shift-jis)
>
> Foo                 4 test.cpp         void Foo() {
>
> Hello               9 test.cpp         void Hello() {
>
>
> *My environment*
>
> OS
>
> Windows 11 Enterprise 22H2 64bit Build 22621.2428
>
> gtags --version
>
> gtags (Global) 6.6.9
>
> Powered by Berkeley DB 1.85.
>
> Copyright (c) 1996-2022 Tama Communications Corporation
>
> License GPLv3+: GNU GPL version 3 or later
> http://www.gnu.org/licenses/gpl.html
>
> This is free software; you are free to change and redistribute it.
>
> There is NO WARRANTY, to the extent permitted by law.
>
>
> *Possible Solutions*
>
>    - Add a command line encoding option to read the file properly.
>    - Find out why such file cannot be fully parsed, ignore such special
>    error, and continue parsing.
>
> Also, if such case happens, at least print out some error message to
> inform user that some files are not fully parsed.
>
>
>
>
>
> Johnny Cheng
>
>

-- 
Shigio YAMAGUCHI <shi...@gnu.org>
PGP fingerprint:
26F6 31B4 3D62 4A92 7E6F  1C33 969C 3BE3 89DD A6EB

Reply via email to