On Monday, 23 March 2015 at 19:25:08 UTC, Tobias Pankrath wrote:
I made the same test in C# using a 30MB plain ASCII text file. Compared to fastest method proposed by Andrei, results are not the best:

D:
readText.representation.count!(c => c == '\n') - 428 ms
byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms

C#:
File.ReadAllLines.Length - 216 ms;

Win64, D 2.066.1, Optimizations were turned on in both cases.

The .net code is clearly not performance oriented (http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,675b2259e8706c26), I suspect that .net runtime is performing some optimizations under the hood.

Does the C# version validate the input? Using std.file.read instead of readText.representation halves the runtime on my machine.

Source code is available at the link above. Since the C# version works internally with streams and UTF-16 chars, the pseudocode looks like this:

---
initilialize a LIST with 16 items;
while (!eof)
{
  read 4096 bytes in a buffer;
  decode them to UTF-16 in a wchar[] buffer
  while (moredata in the buffer)
  {
    read from buffer until (\n or \r\n or \r);
    discard end of line;
    if (nomorespace in LIST)
       double its size.
    add the line to LIST.
  }
}
return number of items in the LIST.
---

Since this code is clearly not the best for this task, as I suspected, I looked into jitted code and it seems that the .net runtime is smart enough to recognize this pattern and is doing the following:
- file is mapped into memory using CreateFileMapping
- does not perform any decoding, since \r and \n are ASCII
- does not create any list
- searches incrementally for \r, \r\n, \n using CompareStringA and LOCALE_INVARIANT and increments at each end of line - there is no temporary memory allocation since searching is performed directly on the mapping handle
- returns the count.

Reply via email to