Fgrep: Identifying all matching patterns

Henrik Olsen Thu, 24 May 2012 03:44:24 -0700

I'm using fgrep to match many patterns, given with -f, against many lines
of text, e.g. 20k+ patterns against 50k+ lines.

Performance seems good. Thanks for that. But I'm missing some tricks to
achieve my real goal.

1)
I can quickly get matching lines, with -n see line unmbers, and with -o see
only the match, and with --color=always and some parsing see both the full
matching line, and extract the matched part only.

An option similar to -n, but for line numbers in pattern file would suit me
very well.

Currently, with --color output, it seems I see the most specific match, but
I need to identify all patterns that have matched a line. How can this be
done?

2)
My patterns have some metadata associated with them, e.g. an integer id
(patterns being maintained by users in a ui). I would like a solution to 1)
to also include this metadata.

Sample data:

patterns.txt:
42, pattern1, 2012-03-14
53, pattern2, 2012-05-23
78, pattern, 2012-05-24

Third one overlapping 1 and 2.

tofilter.txt:
foo
something-with-pattern1
something-else-with-pattern2-foo
bar

Imagined end result of matching patterns.txt against tofilter.txt (fine if
I need to use other tools than fgrep along the way, as long as it's fast):

<line-no-from-tofilter.txt>:<full-matching-line-from-tofilter.txt>:<line-no-from-patterns.txt>+

2:something-with-pattern1:42:78
3:something-else-with-pattern2-foo:53:78

<full-matching-line-from-tofilter.txt> can be skipped, as I can look them
up by line no later. Just included for overview here.

I would love some good suggestions on how to achive this, or hear if
support for any of these requirements are already present (but not seen by
me) or worth considering.

Regards
Henrik

Fgrep: Identifying all matching patterns

Reply via email to