On Monday, 19 March 2012 at 17:23:36 UTC, Andrei Alexandrescu wrote:
On 3/18/12 11:12 PM, Jay Norwood wrote:
I'm timing operations processing 10 2MB text files in parallel. I haven't gotten to the part where I put the words in the map, but I've done enough through this point to say a few things about the measurements.

Great work! This prompts quite a few bug reports and enhancement suggestions - please submit them to bugzilla.

I don't know if they are bugs. On D.learn I got the explanation that the matches.captures.length() just returns the matches in the expressions surrounded by (), so I don't think this can be used ,other than in a for loop, to count lines, for example. std.algorithm.count works ok, but I was hoping that there was something in the ctRegex that would make it work as fast as the hand-coded string scan.


Two quick notes:

On the other end of the spectrum is the byLine version of the read. So this is way too slow to be promoting in our examples, and if anyone is using this in the code you should instead read chunks ... maybe 1MB like in my example later below, and then split up the lines yourself.

// read files by line ... yikes! don't want to do this
//finished! time: 485 ms
void wcp_byLine(string fn)
{
auto f = File(fn);
foreach(line; f.byLine(std.string.KeepTerminator.yes)){
}
}

What OS did you use? (The implementation of byLine varies a lot across OSs.)

I'm doing everything now on win7-64 right now.



I wanted for a long time to improve byLine by allowing it to do its own buffering. That means once you used byLine it's not possible to stop it, get back to the original File, and continue reading it. Using byLine is a commitment. This is what most uses of it do anyway.

Ok, this was the good surprise. Reading by chunks was faster than
reading the whole file, by several ms.

What may be at work here is cache effects. Reusing the same 1MB may place it in faster cache memory, whereas reading 20MB at once may spill into slower memory.

Yes, I would guess that's the problem. This corei7 has 8MB cache, and the threadpool creates 7 active tasks by default, as I understand, so even 1MB blocks is on the border when running parallel. I'll lower the chunk size to some level that seems reasonable and retest.



Andrei


Reply via email to