On 01/21/2011 08:27 PM, Reuben Thomas wrote:
This is an interesting suggestion, not just because of performance
reasons, but because I was trying to interface at the library level,
while using a decompressor program directly would avoid having to do
API impedance matching.

Right, and it would also be strictly more powerful. For example, it could allow grepping in binary files such as .odt or .doc.

I'd base the choice of a filter strictly on the extension. Not using magic numbers avoids the problem of buffering stdin.

There are of course various possibilities on how to implement it. For example you could have a file like ~/.grep.filters or /etc/filters.grep

.gz gzip -dc
.bz2 bzip2 -dc
.pdf pdftotext

possibly with an option --filters/--no-filters. Reading the configuration files should be skipped when grepping stdin to avoid useless stats.

Given the use case of "grep -r", another possibility could be to add --filters=recurse and make this the default. "-r" would turn on --filters, while no "-r" would leave it off. I don't think it's worth the complication though.

Paolo

Reply via email to