Hello,
While investigating memory leaks in sed, I think I found one
in gnulib's regex module.
This happens with character sets in multibyte locales,
which are allocated but not free due to incorrect "#ifdef _LIBC".
Can be reproduced with:
==============================================
$ echo 1 | LC_ALL=en_CA.utf8 ./sed/sed '/[0-9]/p'
[....]
==1176==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x7fdee483c6a0 in __interceptor_realloc
../../../../gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:105
#1 0x452fc4 in build_range_exp lib/regcomp.c:2779
#2 0x453f26 in parse_bracket_exp lib/regcomp.c:3250
#3 0x450b22 in parse_expression lib/regcomp.c:2302
#4 0x4504c6 in parse_branch lib/regcomp.c:2221
#5 0x45015e in parse_reg_exp lib/regcomp.c:2173
#6 0x44ff03 in parse lib/regcomp.c:2141
#7 0x4474f1 in re_compile_internal lib/regcomp.c:803
#8 0x444999 in rpl_re_compile_pattern lib/regcomp.c:230
#9 0x41384e in compile_regex_1 sed/regexp.c:115
#10 0x413f81 in compile_regex sed/regexp.c:194
#11 0x406aec in compile_address sed/compile.c:962
#12 0x40710a in compile_program sed/compile.c:1038
#13 0x40a21f in compile_string sed/compile.c:1574
#14 0x415beb in main sed/sed.c:369
#15 0x7fdee41cc2e0 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x7fdee483c6a0 in __interceptor_realloc
../../../../gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:105
#1 0x452f7c in build_range_exp lib/regcomp.c:2777
#2 0x453f26 in parse_bracket_exp lib/regcomp.c:3250
#3 0x450b22 in parse_expression lib/regcomp.c:2302
#4 0x4504c6 in parse_branch lib/regcomp.c:2221
#5 0x45015e in parse_reg_exp lib/regcomp.c:2173
#6 0x44ff03 in parse lib/regcomp.c:2141
#7 0x4474f1 in re_compile_internal lib/regcomp.c:803
#8 0x444999 in rpl_re_compile_pattern lib/regcomp.c:230
#9 0x41384e in compile_regex_1 sed/regexp.c:115
#10 0x413f81 in compile_regex sed/regexp.c:194
#11 0x406aec in compile_address sed/compile.c:962
#12 0x40710a in compile_program sed/compile.c:1038
#13 0x40a21f in compile_string sed/compile.c:1574
#14 0x415beb in main sed/sed.c:369
#15 0x7fdee41cc2e0 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
==============================================
I think the attached patch fixes the issue.
Comments welcomed.
-assaf
>From b1c219384e22fa97ee6fbad7f831b2dddde18c54 Mon Sep 17 00:00:00 2001
From: Assaf Gordon <assafgor...@gmail.com>
Date: Tue, 31 Jul 2018 12:18:26 -0600
Subject: [PATCH] regex: fix memory leak in multibyte character set regexes.
* lib/regcomp.c (free_charset): Always free range_{starts,ends} member
variables; They are defined in 'struct re_charset_t' even if not _LIBC.
---
ChangeLog | 6 ++++++
lib/regcomp.c | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/ChangeLog b/ChangeLog
index 07e970c10..2d245ebc3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2018-07-31 Assaf Gordon <assafgor...@gmail.com>
+
+ regex: fix memory leak in multibyte character set regexes.
+ * lib/regcomp.c (free_charset): Always free range_{starts,ends} member
+ variables; They are defined in 'struct re_charset_t' even if not _LIBC.
+
2018-07-27 Bruno Haible <br...@clisp.org>
iswcntrl: Mention minor problem on macOS.
diff --git a/lib/regcomp.c b/lib/regcomp.c
index 7b5ddaad0..b08a0de6c 100644
--- a/lib/regcomp.c
+++ b/lib/regcomp.c
@@ -3802,9 +3802,9 @@ free_charset (re_charset_t *cset)
# ifdef _LIBC
re_free (cset->coll_syms);
re_free (cset->equiv_classes);
+# endif
re_free (cset->range_starts);
re_free (cset->range_ends);
-# endif
re_free (cset->char_classes);
re_free (cset);
}
--
2.11.0