forcemerge 195719 358858 thanks Hello,
I investigated this memory leak, and did not find any evidence of a memory leak (a large amount of memory is used, without leaking). Thus I'm merging 195719 358858 which are both about grep leaking memory. Consequently, the 358858's severity is downgraded from important to normal. I also recommend to close these bugs or tag them as wontfix and changing the title to "grep uses too much memory", or a wishlist --lowmem option could be implemented. I'm answering below to some of the comments of #195719 and #358858 grep uses a DFA algorithm to perform regexp matching. This DFA algorithm is either implemented in grep, or in the libc (when re_search is used). Which DFA algorithm is used depends on the version of grep and on the grep options. The DFA algorithm is a state machine and each time it is used, the automaton which represents the regexp may use more memory because a new transition is investigated. The memory allocated for the automaton is not freed after each line is parsed, but it is kept so that if a transition path is also used in a later line, the processing will be faster. Thus more and more memory is allocated. This explains Justin concern in #195719: On Thu, Nov 25, 2004 at 06:38:53PM -0500, Justin Pryzby wrote: > > What I don't understand and cannot explain is why memory usage begins > to climb linearly *after* the compilation. It seems to me that it > should do the exact same thing once per input line, and that should > never require more memory (assuming input lines are about the same > length, which they are). Thus I suspect some other problem. Also, in #195719: On Thu, Dec 10, 2005 at 20:08:28 -0500, Justin Pryzby wrote: > > The culprit seems to be > > 1024816 bytes in 2596 blocks are still reachable in loss record 18 of 18 > at 0x1B90459D: malloc (vg_replace_malloc.c:130) > by 0x804C5EC: xmalloc (dfa.c:143) > by 0x804E9BB: build_state (dfa.c:2330) > by 0x804FC7E: dfaexec (dfa.c:2372) > by 0x8054DE2: EGexecute (search.c:402) > by 0x804A6C5: grepbuf (grep.c:732) > by 0x804B1A5: grepfile (grep.c:851) > by 0x804C39D: main (grep.c:1788) > > (valgrind --leak-check=full --show-reachable=yes > ./build-tree/grep-2.5.1/src/grep -f /tmp/words /usr/share/dict/words) I could not reproduce exactly the same (the grep DFA algorithm was used, and currently grep uses re_search from the libc, so the `leak' is now in the libc) Given that /tmp/words was composed of > 16 characters words, the DFA algorithm had to build one automaton per lines in /tmp/words, each of these automaton should have more than 16 states, if there are 150 lines in /tmp/words, we are close to the number of blocks allocated by build_state. As the automata are used up to the last line, the buffers are not freed by grep, but this is not a bug. Doing a regfree at the end of grep shows that this valgrind record is then freed (hence the memory is not leaking). In this use case, I would recommend to use the -F option, as Justin mentioned in #195719. This will avoid using a automaton. In #358858, the search pattern is an IP, but John used "1.2.3.4". To match an IP, again, the -F option could (should?) be used, or at least the regular expression should be tighten to "1\.2\.3\.4" (The maximum number of transitions is 7*256 versus probably something like 256*256*256*256) So I'm not really surprised by a 200M memory usage. Kind Regards, -- Nekral -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

