I'm running 64-bit builds of grep 2.14 and 2.5.1 on a Red Hat 5.6 box - grep 2.14 is significantly slower than 2.5.1 on a simple regex - the times are: grep 2.5.1: 4.39user 3.19system 0:07.60elapsed grep 2.14: 25.92user 2.84system 0:28.76elapsed
the grep commandline is -i "<name>.*russia" - the file is a large XML file with 101,766,751 lines around 3.6 GB size - there are 14,772 matched lines - runs are in the C locale - both grep builds have the default configuration - here are callgrind top Ir counts: grep 2.5.1: 5,985,715,429 kwset.c:kwsexec 833,138,736 dfa.c:dfaexec 360,061,388 ???:memchr 110,119,157 search.c:EGexecute 34,010,204 grep.c:grepfile 32,198,545 ???:__ctype_get_mb_cur_max 11,459,760 grep.c:fillbuf 7,175,377 ???:memmove 3,623,898 grep.c:grepbuf grep 2.14: 36,717,431,504 dfa.c:dfaexec 15,709,111,428 ???:memchr 12,363,145,663 kwset.c:kwsexec 6,483,204,386 dfasearch.c:EGexecute 14,650,909 ???:memrchr 10,358,230 main.c:fillbuf 7,172,801 ???:memmove 7,162,667 main.c:grepdesc 4,484,004 main.c:grepbuf 1,250,200 ???:__ctype_get_mb_cur_max and top function call counts: grep 2.5.1: kwsexec 1656108 __ctype_get_mb_cur_max 1547396 memchr 1547383 dfaexec 1547383 __ctype_get_mb_cur_max 1547383 __ctype_get_mb_cur_max 1547383 __ctype_get_mb_cur_max 1532611 EGexecute 124962 __ctype_get_mb_cur_max 124962 read 110191 grepbuf 110190 fillbuf 110190 memmove 110189 __ctype_get_mb_cur_max 108725 prtext 14772 prline 14772 grep 2.14: memchr 101766751 kwsexec 101766751 dfaexec 101766751 EGexecute 124966 __ctype_get_mb_cur_max 124966 __ctype_get_mb_cur_max 124966 read 110195 memrchr 110194 grepbuf 110194 fillbuf 110194 memmove 110193 prtext 14772 prline 14772 Ratios of Ir counts to function call counts: grep 2.5.1: dfaexec: 538.42 = 833138736/1547383 kwsexec: 3614.33 = 5985715429/1656108 memchr: 232.69 = 360061388/1547383 grep 2.14: dfaexec: 360.80 = 36717431504/101766751 kwsexec: 121.48 = 12363145663/101766751 memchr:154.36 = 15709111428/101766751 1. grep 2.14 calls kwsexec, dfaexec and memchr once per line while 2.5.1 makes far fewer calls to those functions 2. grep 2.5.1 calls __ctype_get_mb_cur_max many more times than 2.14 but overall spends less time in the function 3. grep 2.14 calls memrchr while grep 2.5.1 does not 4. grep 2.5.1 generally passes longer chunks to memchr thus reducing the overall time it spends in the function Is there a runtime option or buildtime configuration for grep 2.14 that could give it comparable performance to grep 2.5.1 for the sort of simple regex in my example? Zartaj
