On 18/03/15 17:24, Bernhard Voelker wrote:
> On 03/18/2015 04:57 PM, Pádraig Brady wrote:
>> +  wc -l is now up to 6 times faster with short lines.
> 
>> diff --git a/src/wc.c b/src/wc.c
>> index 8cb5163..8125100 100644
>> --- a/src/wc.c
>> +++ b/src/wc.c
>> @@ -264,6 +264,8 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, 
>> off_t current_pos)
>>       {
>>         /* Use a separate loop when counting only lines or lines and bytes --
>>            but not chars or words.  */
>> +      bool long_lines = false;
>> +      bool check_len = true;
>>         while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
>>           {
>>             char *p = buf;
>> @@ -275,12 +277,39 @@ wc (int fd, char const *file_x, struct fstatus 
>> *fstatus, off_t current_pos)
>>                 break;
>>               }
>>
>> +          char *end = p + bytes_read;
>> +          char *line_start = p;
>> +
>> +          /* Avoid function call overhead for shorter lines.  */
>> +          if (check_len)
>> +            while (p != end)
>> +              {
>> +                lines += *p++ == '\n';
>> +                /* If there are more than 300 chars in the first 10 lines,
>> +                   then use memchr, where system specific optimizations
>> +                   may outweigh any function call overhead.  */
>> +                if (lines <= 10)
>> +                  {
>> +                    if (p - line_start > 300)
>> +                      {
>> +                        long_lines = true;
>> +                        break;
>> +                      }
>> +                  }
>> +              }
>> +          else if (! long_lines)
>> +            while (p != end)
>> +              lines += *p++ == '\n';
>> +
> 
> Doesn't this run into the memchr loop in both cases
> (which the compiler seems to optimize away, but looks odd)?

There will be a single memchr(,,0) call per block which is insignificant.

>> +          /* memchr is more efficient with longer lines.  */
>>             while ((p = memchr (p, '\n', (buf + bytes_read) - p)))
>>               {
>>                 ++p;
>>                 ++lines;
>>               }
>> +
>>             bytes += bytes_read;
>> +          check_len = false;
>>           }
>>       }
>>   #if MB_LEN_MAX > 1
> 
> In my first tests I can second the speedup for short lines.
> 
> Where do you have the magic 30 from?
> Tests here show that the effect reverses with line length
> between 10 and ~27, i.e., the new version is ~80% slower.
> Beyond that, both versions behave the same, as expected.
> I have to test more this night.

Yes 30 was a bit aggressive.
I had already changed it to 15 (150) locally here after more testing.

Latest attached.

thanks for the careful review!
Pádraig.
From 868c6a152dc6c11f59a7ba0a5dacdd909958dfd0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kristoffer=20Br=C3=A5nemyr?= <[email protected]>
Date: Wed, 18 Mar 2015 15:32:19 +0000
Subject: [PATCH] wc: speedup counting of short lines

Using a test file generated with:
  yes | head -n100M > 2x100M.txt

before> time wc -l 2x100M.txt
  real 0.842s
  user 0.810s
  sys  0.033s

after> time wc -l 2x100M.txt
  real 0.142s
  user 0.111s
  sys  0.031s

* src/wc.c (wc): Split the loop that deals with -l into 3.
The first is used at the start of the input to determine if
the average line length is < 15, and if so the second loop is
used to look for '\n' internally to wc.  For longer lines,
memchr is used as before to take advantage of system specific
optimizations which any outweigh function call overhead.
Note the first 2 loops could be combined, though in testing,
GCC 4.9.2 at least, wasn't sophisticated enough to separate
the loops based on the "check_len" invariant.
Note also __builtin_memchr() isn't significant here as
GCC currently only applies constant folding with that.
* NEWS: Mention the improvement.
---
 NEWS     |  2 ++
 src/wc.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/NEWS b/NEWS
index 1a51235..81031c6 100644
--- a/NEWS
+++ b/NEWS
@@ -94,6 +94,8 @@ GNU coreutils NEWS                                    -*- outline -*-
   stat and tail now know about IBRIX.  stat -f --format=%T now reports the file
   system type, and tail -f uses polling for files on IBRIX file systems.
 
+  wc -l processes short lines much more efficiently.
+
   References from --help and the man pages of utilities have been corrected
   in various cases, and more direct links to the corresponding online
   documentation are provided.
diff --git a/src/wc.c b/src/wc.c
index 8cb5163..a00c9f6 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -264,6 +264,8 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos)
     {
       /* Use a separate loop when counting only lines or lines and bytes --
          but not chars or words.  */
+      bool long_lines = false;
+      bool check_len = true;
       while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
         {
           char *p = buf;
@@ -275,12 +277,38 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, off_t current_pos)
               break;
             }
 
+          char *end = p + bytes_read;
+
+          /* Avoid function call overhead for shorter lines.  */
+          if (check_len)
+            while (p != end)
+              {
+                lines += *p++ == '\n';
+                /* If there are more than 150 chars in the first 10 lines,
+                   then use memchr, where system specific optimizations
+                   may outweigh function call overhead.  */
+                if (lines <= 10)
+                  {
+                    if (p - buf > 150)
+                      {
+                        long_lines = true;
+                        break;
+                      }
+                  }
+              }
+          else if (! long_lines)
+            while (p != end)
+              lines += *p++ == '\n';
+
+          /* memchr is more efficient with longer lines.  */
           while ((p = memchr (p, '\n', (buf + bytes_read) - p)))
             {
               ++p;
               ++lines;
             }
+
           bytes += bytes_read;
+          check_len = false;
         }
     }
 #if MB_LEN_MAX > 1
-- 
2.1.0

Reply via email to