On 04/16/2010 02:04 AM, Ivan wrote:
I used to use
grep .
for removing blank lines, until I realized how slow it is for large
numbers of lines. So I switched to
grep -v '^$'
, which is as fast as one would expect (well, not with the grep that
comes with MacOSX 10.5.8 (GNU grep version 2.5.1), but this seems to
have been fixed sometime between 2.5.1 and 2.6.3).
True. You'd need to expand UTF-8 period characters to the appropriate
character sets, then you can use the faster single-byte character set
matcher. It's on my todo list.
It wouldn't be exactly as fast as your grep -v solution (which is
optimal and preferred) however, because it will check that a character
in the line is a valid UTF-8 character. In particular it would be slow
and have false negatives if you're document is not UTF-8.
You can also use "LC_ALL=C grep .", that would be fast and exactly
equivalent to "grep -v '^$'".
Paolo