Re: Making byLine faster: we should be able to delegate this

rumbu via Digitalmars-d Mon, 23 Mar 2015 14:16:10 -0700

On Monday, 23 March 2015 at 19:25:08 UTC, Tobias Pankrath wrote:

I made the same test in C# using a 30MB plain ASCII text file.Compared to fastest method proposed by Andrei, results are notthe best:
D:
readText.representation.count!(c => c == '\n') - 428 ms
byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms

C#:
File.ReadAllLines.Length - 216 ms;

Win64, D 2.066.1, Optimizations were turned on in both cases.
The .net code is clearly not performance oriented(http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,675b2259e8706c26),I suspect that .net runtime is performing some optimizationsunder the hood.
Does the C# version validate the input? Using std.file.readinstead of readText.representation halves the runtime on mymachine.

Source code is available at the link above. Since the C# versionworks internally with streams and UTF-16 chars, the pseudocodelooks like this:


---
initilialize a LIST with 16 items;
while (!eof)
{
  read 4096 bytes in a buffer;
  decode them to UTF-16 in a wchar[] buffer
  while (moredata in the buffer)
  {
    read from buffer until (\n or \r\n or \r);
    discard end of line;
    if (nomorespace in LIST)
       double its size.
    add the line to LIST.
  }
}
return number of items in the LIST.
---

Since this code is clearly not the best for this task, as Isuspected, I looked into jitted code and it seems that the .netruntime is smart enough to recognize this pattern and is doingthe following:

- file is mapped into memory using CreateFileMapping
- does not perform any decoding, since \r and \n are ASCII
- does not create any list

- searches incrementally for \r, \r\n, \n using CompareStringAand LOCALE_INVARIANT and increments at each end of line- there is no temporary memory allocation since searching isperformed directly on the mapping handle

- returns the count.

Re: Making byLine faster: we should be able to delegate this

Reply via email to