On Monday, 19 March 2012 at 17:23:36 UTC, Andrei Alexandrescu
wrote:
On 3/18/12 11:12 PM, Jay Norwood wrote:
I'm timing operations processing 10 2MB text files in
parallel. I
haven't gotten to the part where I put the words in the map,
but I've
done enough through this point to say a few things about the
measurements.
Great work! This prompts quite a few bug reports and
enhancement suggestions - please submit them to bugzilla.
I don't know if they are bugs. On D.learn I got the explanation
that the matches.captures.length() just returns the matches in
the expressions surrounded by (), so I don't think this can be
used ,other than in a for loop, to count lines, for example.
std.algorithm.count works ok, but I was hoping that there was
something in the ctRegex that would make it work as fast as the
hand-coded string scan.
Two quick notes:
On the other end of the spectrum is the byLine version of the
read. So
this is way too slow to be promoting in our examples, and if
anyone is
using this in the code you should instead read chunks ...
maybe 1MB like
in my example later below, and then split up the lines
yourself.
// read files by line ... yikes! don't want to do this
//finished! time: 485 ms
void wcp_byLine(string fn)
{
auto f = File(fn);
foreach(line; f.byLine(std.string.KeepTerminator.yes)){
}
}
What OS did you use? (The implementation of byLine varies a lot
across OSs.)
I'm doing everything now on win7-64 right now.
I wanted for a long time to improve byLine by allowing it to do
its own buffering. That means once you used byLine it's not
possible to stop it, get back to the original File, and
continue reading it. Using byLine is a commitment. This is what
most uses of it do anyway.
Ok, this was the good surprise. Reading by chunks was faster
than
reading the whole file, by several ms.
What may be at work here is cache effects. Reusing the same 1MB
may place it in faster cache memory, whereas reading 20MB at
once may spill into slower memory.
Yes, I would guess that's the problem. This corei7 has 8MB cache,
and the threadpool creates 7 active tasks by default, as I
understand, so even 1MB blocks is on the border when running
parallel. I'll lower the chunk size to some level that seems
reasonable and retest.
Andrei