Am 06.07.2015 um 14:42 schrieb Nguyễn Thái Ngọc Duy:
Noticed-by: Plamen Totev <plamen.to...@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
---
grep.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/grep.c b/grep.c
index b58c7c6..48db15a 100644
--- a/grep.c
+++ b/grep.c
@@ -378,7 +378,7 @@ static void free_pcre_regexp(struct grep_pat *p)
}
#endif /* !USE_LIBPCRE */
-static int is_fixed(const char *s, size_t len)
+static int is_fixed(const char *s, size_t len, int ignore_icase)
{
size_t i;
@@ -391,6 +391,13 @@ static int is_fixed(const char *s, size_t len)
for (i = 0; i < len; i++) {
if (is_regex_special(s[i]))
return 0;
+ /*
+ * The builtin substring search can only deal with case
+ * insensitivity in ascii range. If there is something outside
+ * of that range, fall back to regcomp.
+ */
+ if (ignore_icase && (unsigned char)s[i] >= 128)
+ return 0;
How about "isascii(s[i])"?
}
return 1;
@@ -398,18 +405,19 @@ static int is_fixed(const char *s, size_t len)
static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
{
+ int ignore_icase = opt->regflags & REG_ICASE || p->ignore_case;
int err;
p->word_regexp = opt->word_regexp;
p->ignore_case = opt->ignore_case;
Using p->ignore_case before this line, as in initialization of the new
variable ignore_icase above, changes the meaning.
- if (opt->fixed || is_fixed(p->pattern, p->patternlen))
+ if (opt->fixed || is_fixed(p->pattern, p->patternlen, ignore_icase))
p->fixed = 1;
else
p->fixed = 0;
if (p->fixed) {
- if (opt->regflags & REG_ICASE || p->ignore_case)
+ if (ignore_case)
ignore_icase instead? ignore_case is for the config variable
core.ignorecase. Tricky.
p->kws = kwsalloc(tolower_trans_tbl);
else
p->kws = kwsalloc(NULL);
So the optimization before this patch was that if a string was searched
for without -F then it would be treated as a fixed string anyway unless
it contained regex special characters. Searching for fixed strings
using the kwset functions is faster than using regcomp and regexec,
which makes the exercise worthwhile.
Your patch disables the optimization if non-ASCII characters are
searched for because kwset handles case transformations only for ASCII
chars.
Another consequence of this limitation is that -Fi (explicit
case-insensitive fixed-string search) doesn't work properly with
non-ASCII chars neither. How can we handle this one? Fall back to
regcomp by escaping all special characters? Or at least warn?
Tests would be nice. :)
René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html