https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92919
Bug ID: 92919 Summary: invalid memory access in wide_str_to_charconst when running ucn2.C testcase (caught by hwasan) Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- Target: aarch64-none-linux-gnu When running the ucn2.C testcase, hwasan catches an invalid access in the function `wide_str_to_charconst`. The problematic line is: const char16_t p = u'\U00110003'; It seems this is to do with the size of the constant, since the line below does not trigger this invalid access. const char16_t j = u'\U0001F914'; yet changing that constant to the below does. const char16_t j = u'\U0011F914'; HWASAN output is below. ==9608==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefdf000080bf at pc 0x000000651270 READ of size 1 at 0xefdf000080bf tags: 5f/79 (ptr/mem) in thread T0 #0 0x65126c in SigTrap<0> ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:27 #1 0x65126c in CheckAddress<(__hwasan::ErrorAction)0, (__hwasan::AccessType)0, 0> ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:88 #2 0x65126c in __hwasan_load1 ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan.cpp:469 #3 0x2b143dc in wide_str_to_charconst ../../gcc-pdtl/libcpp/charset.c:1980 #4 0x2b143dc in cpp_interpret_charconst(cpp_reader*, cpp_token const*, unsigned int*, int*) ../../gcc-pdtl/libcpp/charset.c:2045 #5 0xb31a48 in lex_charconst ../../gcc-pdtl/gcc/c-family/c-lex.c:1368 #6 0xb35964 in c_lex_with_flags(tree_node**, unsigned int*, unsigned char*, int) ../../gcc-pdtl/gcc/c-family/c-lex.c:617 #7 0x89c6bc in cp_lexer_get_preprocessor_token ../../gcc-pdtl/gcc/cp/parser.c:807 #8 0x943cc0 in cp_lexer_new_main ../../gcc-pdtl/gcc/cp/parser.c:654 #9 0x943cc0 in cp_parser_new ../../gcc-pdtl/gcc/cp/parser.c:3968 #10 0x943cc0 in c_parse_file() ../../gcc-pdtl/gcc/cp/parser.c:42963 #11 0xb50c90 in c_common_parse_file() ../../gcc-pdtl/gcc/c-family/c-opts.c:1185 #12 0x16a49fc in compile_file ../../gcc-pdtl/gcc/toplev.c:458 #13 0x6466bc in do_compile ../../gcc-pdtl/gcc/toplev.c:2280 #14 0x6466bc in toplev::main(int, char**) ../../gcc-pdtl/gcc/toplev.c:2419 #15 0x649468 in main ../../gcc-pdtl/gcc/main.c:39 #16 0xffff93dd689c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f89c) [0xefdf000080a0,0xefdf000080c0) is a small unallocated heap chunk; size: 32 offset: 31 0xefdf000080bf is located 1 bytes to the left of 2-byte region [0xefdf000080c0,0xefdf000080c2) allocated here: #0 0x652bc0 in __sanitizer_realloc ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_interceptors.cpp:146 #1 0x2b95f40 in xrealloc ../../gcc-pdtl/libiberty/xmalloc.c:179 #2 0x2b122ec in cpp_interpret_string_1 ../../gcc-pdtl/libcpp/charset.c:1753 #3 0x2b14284 in cpp_interpret_string(cpp_reader*, cpp_string const*, unsigned long, cpp_string*, cpp_ttype) ../../gcc-pdtl/libcpp/charset.c:1784 #4 0x2b14284 in cpp_interpret_charconst(cpp_reader*, cpp_token const*, unsigned int*, int*) ../../gcc-pdtl/libcpp/charset.c:2036 #5 0xb31a48 in lex_charconst ../../gcc-pdtl/gcc/c-family/c-lex.c:1368 #6 0xb35964 in c_lex_with_flags(tree_node**, unsigned int*, unsigned char*, int) ../../gcc-pdtl/gcc/c-family/c-lex.c:617 #7 0x89c6bc in cp_lexer_get_preprocessor_token ../../gcc-pdtl/gcc/cp/parser.c:807 #8 0x943cc0 in cp_lexer_new_main ../../gcc-pdtl/gcc/cp/parser.c:654 #9 0x943cc0 in cp_parser_new ../../gcc-pdtl/gcc/cp/parser.c:3968 #10 0x943cc0 in c_parse_file() ../../gcc-pdtl/gcc/cp/parser.c:42963 #11 0xb50c90 in c_common_parse_file() ../../gcc-pdtl/gcc/c-family/c-opts.c:1185 #12 0x16a49fc in compile_file ../../gcc-pdtl/gcc/toplev.c:458 #13 0x6466bc in do_compile ../../gcc-pdtl/gcc/toplev.c:2280 #14 0x6466bc in toplev::main(int, char**) ../../gcc-pdtl/gcc/toplev.c:2419 #15 0x649468 in main ../../gcc-pdtl/gcc/main.c:39 #16 0xffff93dd689c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x1f89c) #17 0x64cb24 (/home/ubuntu/working-directory/gcc-hwasan-install/libexec/gcc/aarch64-unknown-linux-gnu/10.0.0/cc1plus+0x64cb24) Thread: T0 0xeffe00002000 stack: [0xffffe544a000,0xffffe944a000) sz: 67108864 tls: [0xffff94020000,0xffff94020850) Memory tags around the buggy address (one tag corresponds to 16 bytes): 0d 00 09 00 09 00 e7 09 09 00 e2 0c 9a 0c 0a 4a e7 0c 0d 00 0d 00 05 00 0d 00 08 00 08 00 08 00 08 00 0b 00 0b 00 0b 00 0b 00 0e 00 0e 00 05 00 0e 00 08 00 08 00 09 00 08 00 0c 00 0c 00 09 00 0c 00 0c 00 0c 00 08 00 0c 00 0b 00 0b 00 07 00 0b 00 0a 00 0a 00 09 00 0a 00 0c 00 0c 00 ec 0f 0c 00 08 00 07 00 58 03 0d 00 5b 0f 08 00 4f 4f 08 00 ab ab 09 00 09 00 09 00 09 00 09 00 09 00 => 09 00 09 00 28 08 0e 00 cd 0b 79 [79] 02 72 71 71 <= 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Tags for short granules around the buggy address (one tag corresponds to 16 bytes): af .. .. .. 4b .. 9d .. 45 .. 3f .. 7b .. 11 .. => c9 .. 74 .. .. 28 d8 .. .. cd .. [..] 5f .. .. .. <= .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. See https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#short-granules for a description of short granule tags SUMMARY: HWAddressSanitizer: tag-mismatch ../../../../gcc-pdtl/libsanitizer/hwasan/hwasan_checks.h:27 in SigTrap<0> When running the testcase (with just the problematic line) under GDB, we can stop on entry to the function `wide_str_to_charconst` and inspect the relevant variables. It seems that `str.len` is 2, `cwidth` is 8, `bigend` is false, and `width` is 16. Hence the access on line 1980 c = bigend ? str.text[off + i] : str.text[off + nbwc - i - 1]; becomes nbwc = width / 8 off = 2 - (nbwc * 2) c = str.text[off + nbwc - i - 1] c = str.text[2 - nbwc - 1] c = str.text[2 - (width / 8) - 1] c = str.text[2 - (16 / 8) - 1] c = str.text[-1] Which is accessing one byte before the text buffer (as mentioned in the HWASAN dump). (The inspection in GDB was largely to demonstrate this isn't a bug in HWASAN).