https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92919

            Bug ID: 92919
           Summary: invalid memory access in wide_str_to_charconst when
                    running ucn2.C testcase (caught by hwasan)
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matmal01 at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-none-linux-gnu

When running the ucn2.C testcase, hwasan catches an invalid access in the
function `wide_str_to_charconst`.

The problematic line is:
const char16_t p = u'\U00110003';

It seems this is to do with the size of the constant, since the line below does
not trigger this invalid access.
const char16_t j = u'\U0001F914';
yet changing that constant to the below does.
const char16_t j = u'\U0011F914';

HWASAN output is below.


==9608==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefdf000080bf at pc
0x000000651270
READ of size 1 at 0xefdf000080bf tags: 5f/79 (ptr/mem) in thread T0
    #0 0x65126c in SigTrap<0>
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:27
    #1 0x65126c in CheckAddress<(__hwasan::ErrorAction)0,
(__hwasan::AccessType)0, 0>
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:88
    #2 0x65126c in __hwasan_load1
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan.cpp:469
    #3 0x2b143dc in wide_str_to_charconst ../../gcc-pdtl/libcpp/charset.c:1980
    #4 0x2b143dc in cpp_interpret_charconst(cpp_reader*, cpp_token const*,
unsigned int*, int*) ../../gcc-pdtl/libcpp/charset.c:2045
    #5 0xb31a48 in lex_charconst ../../gcc-pdtl/gcc/c-family/c-lex.c:1368
    #6 0xb35964 in c_lex_with_flags(tree_node**, unsigned int*, unsigned char*,
int) ../../gcc-pdtl/gcc/c-family/c-lex.c:617
    #7 0x89c6bc in cp_lexer_get_preprocessor_token
../../gcc-pdtl/gcc/cp/parser.c:807
    #8 0x943cc0 in cp_lexer_new_main ../../gcc-pdtl/gcc/cp/parser.c:654
    #9 0x943cc0 in cp_parser_new ../../gcc-pdtl/gcc/cp/parser.c:3968
    #10 0x943cc0 in c_parse_file() ../../gcc-pdtl/gcc/cp/parser.c:42963
    #11 0xb50c90 in c_common_parse_file()
../../gcc-pdtl/gcc/c-family/c-opts.c:1185
    #12 0x16a49fc in compile_file ../../gcc-pdtl/gcc/toplev.c:458
    #13 0x6466bc in do_compile ../../gcc-pdtl/gcc/toplev.c:2280
    #14 0x6466bc in toplev::main(int, char**) ../../gcc-pdtl/gcc/toplev.c:2419
    #15 0x649468 in main ../../gcc-pdtl/gcc/main.c:39
    #16 0xffff93dd689c in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f89c)

[0xefdf000080a0,0xefdf000080c0) is a small unallocated heap chunk; size: 32
offset: 31
0xefdf000080bf is located 1 bytes to the left of 2-byte region
[0xefdf000080c0,0xefdf000080c2)
allocated here:
    #0 0x652bc0 in __sanitizer_realloc
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_interceptors.cpp:146
    #1 0x2b95f40 in xrealloc ../../gcc-pdtl/libiberty/xmalloc.c:179
    #2 0x2b122ec in cpp_interpret_string_1 ../../gcc-pdtl/libcpp/charset.c:1753
    #3 0x2b14284 in cpp_interpret_string(cpp_reader*, cpp_string const*,
unsigned long, cpp_string*, cpp_ttype) ../../gcc-pdtl/libcpp/charset.c:1784
    #4 0x2b14284 in cpp_interpret_charconst(cpp_reader*, cpp_token const*,
unsigned int*, int*) ../../gcc-pdtl/libcpp/charset.c:2036
    #5 0xb31a48 in lex_charconst ../../gcc-pdtl/gcc/c-family/c-lex.c:1368
    #6 0xb35964 in c_lex_with_flags(tree_node**, unsigned int*, unsigned char*,
int) ../../gcc-pdtl/gcc/c-family/c-lex.c:617
    #7 0x89c6bc in cp_lexer_get_preprocessor_token
../../gcc-pdtl/gcc/cp/parser.c:807
    #8 0x943cc0 in cp_lexer_new_main ../../gcc-pdtl/gcc/cp/parser.c:654
    #9 0x943cc0 in cp_parser_new ../../gcc-pdtl/gcc/cp/parser.c:3968
    #10 0x943cc0 in c_parse_file() ../../gcc-pdtl/gcc/cp/parser.c:42963
    #11 0xb50c90 in c_common_parse_file()
../../gcc-pdtl/gcc/c-family/c-opts.c:1185
    #12 0x16a49fc in compile_file ../../gcc-pdtl/gcc/toplev.c:458
    #13 0x6466bc in do_compile ../../gcc-pdtl/gcc/toplev.c:2280
    #14 0x6466bc in toplev::main(int, char**) ../../gcc-pdtl/gcc/toplev.c:2419
    #15 0x649468 in main ../../gcc-pdtl/gcc/main.c:39
    #16 0xffff93dd689c in __libc_start_main
(/lib/aarch64-linux-gnu/libc.so.6+0x1f89c)
    #17 0x64cb24 
(/home/ubuntu/working-directory/gcc-hwasan-install/libexec/gcc/aarch64-unknown-linux-gnu/10.0.0/cc1plus+0x64cb24)

Thread: T0 0xeffe00002000 stack: [0xffffe544a000,0xffffe944a000) sz: 67108864
tls: [0xffff94020000,0xffff94020850)
Memory tags around the buggy address (one tag corresponds to 16 bytes):
   0d  00  09  00  09  00  e7  09  09  00  e2  0c  9a  0c  0a  4a   
   e7  0c  0d  00  0d  00  05  00  0d  00  08  00  08  00  08  00   
   08  00  0b  00  0b  00  0b  00  0b  00  0e  00  0e  00  05  00   
   0e  00  08  00  08  00  09  00  08  00  0c  00  0c  00  09  00   
   0c  00  0c  00  0c  00  08  00  0c  00  0b  00  0b  00  07  00   
   0b  00  0a  00  0a  00  09  00  0a  00  0c  00  0c  00  ec  0f   
   0c  00  08  00  07  00  58  03  0d  00  5b  0f  08  00  4f  4f   
   08  00  ab  ab  09  00  09  00  09  00  09  00  09  00  09  00   
=> 09  00  09  00  28  08  0e  00  cd  0b  79 [79] 02  72  71  71 <=
   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00   
   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00   
   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00   
   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00   
   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00   
   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00   
   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00   
   00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00   
Tags for short granules around the buggy address (one tag corresponds to 16
bytes):
   af  ..  ..  ..  4b  ..  9d  ..  45  ..  3f  ..  7b  ..  11  ..   
=> c9  ..  74  ..  ..  28  d8  ..  ..  cd  .. [..] 5f  ..  ..  .. <=
   ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..   
See
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#short-granules
for a description of short granule tags
SUMMARY: HWAddressSanitizer: tag-mismatch
../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:27 in SigTrap<0>





When running the testcase (with just the problematic line) under GDB, we can
stop on entry to the function `wide_str_to_charconst` and inspect the relevant
variables.

It seems that `str.len` is 2, `cwidth` is 8, `bigend` is false, and `width` is
16.  Hence the access on line 1980
c = bigend ? str.text[off + i] : str.text[off + nbwc - i - 1];

becomes 

nbwc = width / 8
off = 2 - (nbwc * 2)
c = str.text[off + nbwc - i - 1]
c = str.text[2 - nbwc - 1]
c = str.text[2 - (width / 8) - 1]
c = str.text[2 - (16 / 8) - 1]
c = str.text[-1]

Which is accessing one byte before the text buffer (as mentioned in the HWASAN
dump).

(The inspection in GDB was largely to demonstrate this isn't a bug in HWASAN).

Reply via email to