Re: Making byLine faster: we should be able to delegate this

Andrei Alexandrescu via Digitalmars-d Mon, 23 Mar 2015 11:36:44 -0700

On 3/23/15 10:43 AM, rumbu wrote:

On Monday, 23 March 2015 at 15:00:07 UTC, John Colvin wrote:

What would be really great would be a performance test suite for
phobos. D is reaching a point where "It'll probably be fast because we
did it right" or "I remember it being fast-ish 3 years ago when i
wrote a small toy test" isn't going to cut it. Real data is needed,
with comparisons to other languages where possible.


I made the same test in C# using a 30MB plain ASCII text file. Compared
to fastest method proposed by Andrei, results are not the best:

D:
readText.representation.count!(c => c == '\n') - 428 ms
byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms

C#:
File.ReadAllLines.Length - 216 ms;

Win64, D 2.066.1, Optimizations were turned on in both cases.

The .net code is clearly not performance oriented
(http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,675b2259e8706c26),
I suspect that .net runtime is performing some optimizations under the
hood.

At this point it gets down to the performance of std.algorithm.count,which could and should be improved. This code accelerates speed 2.5xover count and brings it in the zone of wc -l, which is probably nearthe lower bound achievable:


  auto bytes = args[1].readText.representation;
  for (auto p = bytes.ptr, lim = p + bytes.length;; )
  {
    import core.stdc.string;
    auto r = cast(immutable(ubyte)*) memchr(p, '\n', lim - p);
    if (!r) break;
    ++linect;
    p = r + 1;
  }

Would anyone want to put some work into accelerating count?


Andrei

Re: Making byLine faster: we should be able to delegate this

Reply via email to