On Sat, Aug 03, 2013 at 11:29:01PM +0200, monarch_dodra wrote: > On Friday, 2 August 2013 at 23:51:27 UTC, H. S. Teoh wrote: > >On Fri, Aug 02, 2013 at 06:38:20PM -0500, captaindet wrote: > >[...] > >>FWIW > >>i have to deal with big data files that can be a few GB. for some > >>data analysis software i wrote in C a while back i did some testing > >>with caching and such. turns out that for Win7-64 the automatic > >>caching done by the OS is really good and any attempt to speed > >>things up actually slowed it down. no kidding, i have seen more than > >>2GB of data being automatically cached. of course the system RAM > >>must be larger than the file size (if i remember my tests correctly > >>by a factor of ~2, but this is maybe not a linear relationship, i > >>did not actually change the RAM just the size of the data file) and > >>it will hold it in the cache only as long as there are no concurrent > >>applications requiring RAM or caching. i guess my point is, if your > >>target is Win7 and your files are >5x smaller than the installed RAM > >>i would not bother at all trying to optimize file access. i suppose > >>-nix machine will do a similar good job these days. > >[...] > > > >IIRC, Linux has been caching files (or disk blocks, rather) in memory > >since the days of Win95. Of course, memory in those days was much > >scarcer, but file sizes were smaller too. :) There's still a cost to > >copy the kernel buffers into userspace, though, which should not be > >disregarded. But if you use mmap, then you're essentially accessing > >that memory cache directly, which is as good as it gets. > > > >I don't know how well mmap works on windows, though, IIRC it doesn't > >have the same semantics as Posix, so you could accidentally run into > >performance issues by using it the wrong way on windows. [...] > I did some benching a while back with user bioinfornatics. He had to > do some pretty large file reads, preferably in very little time. > Observations showed my algo was *much* faster under windows then > linux.
Sorry, I lost the context of this discussion, what algo are you referring to? > What we observed is that under windows, as soon as you open a file > for reading, windows starts buffering the file in a parallel thread. > > What we did was create two threads. The first did nothing but read > the file, store it into chunks of memory, and then pass it to a > worker thread. The worker thread did the parsing proper. > > Doing this *halved* the linux runtime, tying it with the > "monothreaded" windows run time. Windows saw no change. Interesting. I wonder if you could, under Linux, mmap a file then have one thread access the first byte of each file block while another thread does the real work with the data. > FYI, the full thread is here: > forum.dlang.org/thread/gmfqwzgtjfnqiajgh...@forum.dlang.org I'll take a look, thanks. T -- The diminished 7th chord is the most flexible and fear-instilling chord. Use it often, use it unsparingly, to subdue your listeners into submission!