This topic can be confusing to beginners because the word "read" is overloaded. 
There is a "system call" named "read", but also many higher level APIs that are 
"buffered" which are also called "read", and people often refer to the general 
activity as just "reading". The buffering means those higher level APIs are 
doing "big reads" into a buffer and then copy small parts of that out for you 
for each line. If data is copied out shortly after it was read into that big 
buffer with a system-call read then the copy is cheap because it was from CPU 
caches (L1/L2/L3).

There are also "zero copy" APIs to do IO like "mmap" on Unix, or in the Nim 
world "memfiles". These let the OS itself do the "buffering" of what is in RAM 
vs "on disk" (or other persistent device like "flash memory" these days).

Your linked yahtzee file is only 900 KB with average line length of about 9 
bytes, but only 790 unique entries (so a small Table). With such a short line 
length you might profit from not creating new strings at all but re-using the 
same one over and over in the loop. That is because string creation and 
anything you might do with only 9 bytes take comparable time. The Nim stdlib 
also has a `split` iterator which lets you iterate over the split fields 
without creating a new `seq[string]`. Looking at your input file format there 
is no need to even `split` at all, though. So, there are simpler things than 
`memfiles` you can do to speed it up.

Also, because you only have 790 unique lines, it should be much faster to get 
the max value by looping over the `Table` after you are done creating it (both 
no 2nd lookup, and 790 <<< 100,000).

So, this is a faster version of your program: 
    
    
    import tables, os, parseutils
    
    proc ytz(path: string): uint =
        var
            table = initTable[uint, uint]()
            n: uint
        for numStr in lines(path):
            if parseUInt(numStr, n) == 0:
                quit("Parse error: " & numStr)
            table.mgetOrPut(n, 0) += n # here
        for key, val in table:
            result = max(result, val)
    
    proc main() =
        echo ytz(paramStr(1))
    
    main()
    
    
    Run

Incidentally, when posting code to this forum, if you put `Nim` after the 
triple backquote it will be rendered with color syntax highlighting.

Reply via email to