I have change the code to parallel on files rather than lines. codes are
anyone have interests.
However, the speed is not satisfactory still (total processing speed
approx. 10M/s, ideally it should be 100M/s, the network speed).
CPU not full, IO not full, and I cannot find the bottleneck...
@Jeremy, thanks for the reply. The bottleneck is IO. You need days just to
stream all files at full speed. Thus waiting to load the whole file will
waste a lot of time. Ideally it will be that when I streamed the data one
pass, the processing is also done without extra time.
@Páll, do you mean that pmap will first do a ``collect`` operation, then
processing? So even you give pmap an iterator, it will not benefit from it?
That will be sad.