On 04/16/2010 02:04 AM, Ivan wrote:
I used to use

grep .

for removing blank lines, until I realized how slow it is for large
numbers of lines. So I switched to

grep -v '^$'

, which is as fast as one would expect (well, not with the grep that
comes with MacOSX 10.5.8 (GNU grep version 2.5.1), but this seems to
have been fixed sometime between 2.5.1 and 2.6.3).

True. You'd need to expand UTF-8 period characters to the appropriate character sets, then you can use the faster single-byte character set matcher. It's on my todo list.

It wouldn't be exactly as fast as your grep -v solution (which is optimal and preferred) however, because it will check that a character in the line is a valid UTF-8 character. In particular it would be slow and have false negatives if you're document is not UTF-8.

You can also use "LC_ALL=C grep .", that would be fast and exactly equivalent to "grep -v '^$'".

Paolo


Reply via email to