Nice, boia01! In my same `/usr/share/nim/**.nim` test I get 768 microseconds 
for your version and 2080us for just doing the memSlices approach..So, 2.7X 
speed up, a bit less than the 4X I saw when I last compared the approach in two 
C versions..maybe unrolling. Dunno.

@alfrednewman - if the statistics of these data files are stable over the whole 
file, you could always stop after the first gigabyte (or maybe less), figure 
out the average line length and use the file size to estimate the number of 
lines, and then it would only be a ~1 second delay. A MemFile knows its size, 
as does each slice...So, this is pretty easy to code. Of course, if the first 
gig is not always representative of the remainder that might not work, but it 
sounds like you probably don't need an exact answer. This is sort of a 
simplified version of one of my/jlp765's suggestions. 2.7 * 1.3 GB/s =~ 3.5 
GB/s is faster IO than many people have. So, even using boia01's code you may 
not see a great speed-up.

Reply via email to