I'm using fgrep to match many patterns, given with -f, against many lines of text, e.g. 20k+ patterns against 50k+ lines.
Performance seems good. Thanks for that. But I'm missing some tricks to achieve my real goal. 1) I can quickly get matching lines, with -n see line unmbers, and with -o see only the match, and with --color=always and some parsing see both the full matching line, and extract the matched part only. An option similar to -n, but for line numbers in pattern file would suit me very well. Currently, with --color output, it seems I see the most specific match, but I need to identify all patterns that have matched a line. How can this be done? 2) My patterns have some metadata associated with them, e.g. an integer id (patterns being maintained by users in a ui). I would like a solution to 1) to also include this metadata. Sample data: patterns.txt: 42, pattern1, 2012-03-14 53, pattern2, 2012-05-23 78, pattern, 2012-05-24 Third one overlapping 1 and 2. tofilter.txt: foo something-with-pattern1 something-else-with-pattern2-foo bar Imagined end result of matching patterns.txt against tofilter.txt (fine if I need to use other tools than fgrep along the way, as long as it's fast): <line-no-from-tofilter.txt>:<full-matching-line-from-tofilter.txt>:<line-no-from-patterns.txt>+ 2:something-with-pattern1:42:78 3:something-else-with-pattern2-foo:53:78 <full-matching-line-from-tofilter.txt> can be skipped, as I can look them up by line no later. Just included for overview here. I would love some good suggestions on how to achive this, or hear if support for any of these requirements are already present (but not seen by me) or worth considering. Regards Henrik
