Hello,

While investigating memory leaks in sed, I think I found one
in gnulib's regex module.
This happens with character sets in multibyte locales,
which are allocated but not free due to incorrect "#ifdef _LIBC".

Can be reproduced with:
==============================================
$ echo 1 | LC_ALL=en_CA.utf8 ./sed/sed '/[0-9]/p'
[....]
==1176==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x7fdee483c6a0 in __interceptor_realloc ../../../../gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:105
    #1 0x452fc4 in build_range_exp lib/regcomp.c:2779
    #2 0x453f26 in parse_bracket_exp lib/regcomp.c:3250
    #3 0x450b22 in parse_expression lib/regcomp.c:2302
    #4 0x4504c6 in parse_branch lib/regcomp.c:2221
    #5 0x45015e in parse_reg_exp lib/regcomp.c:2173
    #6 0x44ff03 in parse lib/regcomp.c:2141
    #7 0x4474f1 in re_compile_internal lib/regcomp.c:803
    #8 0x444999 in rpl_re_compile_pattern lib/regcomp.c:230
    #9 0x41384e in compile_regex_1 sed/regexp.c:115
    #10 0x413f81 in compile_regex sed/regexp.c:194
    #11 0x406aec in compile_address sed/compile.c:962
    #12 0x40710a in compile_program sed/compile.c:1038
    #13 0x40a21f in compile_string sed/compile.c:1574
    #14 0x415beb in main sed/sed.c:369
#15 0x7fdee41cc2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)

Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x7fdee483c6a0 in __interceptor_realloc ../../../../gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:105
    #1 0x452f7c in build_range_exp lib/regcomp.c:2777
    #2 0x453f26 in parse_bracket_exp lib/regcomp.c:3250
    #3 0x450b22 in parse_expression lib/regcomp.c:2302
    #4 0x4504c6 in parse_branch lib/regcomp.c:2221
    #5 0x45015e in parse_reg_exp lib/regcomp.c:2173
    #6 0x44ff03 in parse lib/regcomp.c:2141
    #7 0x4474f1 in re_compile_internal lib/regcomp.c:803
    #8 0x444999 in rpl_re_compile_pattern lib/regcomp.c:230
    #9 0x41384e in compile_regex_1 sed/regexp.c:115
    #10 0x413f81 in compile_regex sed/regexp.c:194
    #11 0x406aec in compile_address sed/compile.c:962
    #12 0x40710a in compile_program sed/compile.c:1038
    #13 0x40a21f in compile_string sed/compile.c:1574
    #14 0x415beb in main sed/sed.c:369
#15 0x7fdee41cc2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
==============================================


I think the attached patch fixes the issue.
Comments welcomed.

-assaf




>From b1c219384e22fa97ee6fbad7f831b2dddde18c54 Mon Sep 17 00:00:00 2001
From: Assaf Gordon <assafgor...@gmail.com>
Date: Tue, 31 Jul 2018 12:18:26 -0600
Subject: [PATCH] regex: fix memory leak in multibyte character set regexes.

* lib/regcomp.c (free_charset): Always free range_{starts,ends} member
variables; They are defined in 'struct re_charset_t' even if not _LIBC.
---
 ChangeLog     | 6 ++++++
 lib/regcomp.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 07e970c10..2d245ebc3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2018-07-31  Assaf Gordon <assafgor...@gmail.com>
+
+	regex: fix memory leak in multibyte character set regexes.
+	* lib/regcomp.c (free_charset): Always free range_{starts,ends} member
+	variables; They are defined in 'struct re_charset_t' even if not _LIBC.
+
 2018-07-27  Bruno Haible  <br...@clisp.org>
 
 	iswcntrl: Mention minor problem on macOS.
diff --git a/lib/regcomp.c b/lib/regcomp.c
index 7b5ddaad0..b08a0de6c 100644
--- a/lib/regcomp.c
+++ b/lib/regcomp.c
@@ -3802,9 +3802,9 @@ free_charset (re_charset_t *cset)
 # ifdef _LIBC
   re_free (cset->coll_syms);
   re_free (cset->equiv_classes);
+# endif
   re_free (cset->range_starts);
   re_free (cset->range_ends);
-# endif
   re_free (cset->char_classes);
   re_free (cset);
 }
-- 
2.11.0

Reply via email to