> `for l in lines(file)` should be slightly faster than your `while` loop.

That actually made more of a difference than I would've expected!

> There is no reason to assume that though since you're mostly using Python's 
> interface to compiled code.

To expand a bit on this: Many of Pythons libraries are actually C libraries 
with a Python interface. This means that as long as you're just interfacing 
with the library it stays fast. However as soon as you actually start doing 
work on the Python side, or possibly even shuffle data between libraries, you 
will notice things starting to slow down. The Nim standard library module for 
gzip is likely not as optimised as the Python one, hence the slowdown.

I tried with Zippy, doing something similar to what your original code does:
    
    
    import std/monotimes, times, strutils
    import zippy
    
    let filePath = "somelargfile.nt.gz"
    var count = 0
    
    let startTime = getMonoTime()
    let data = filePath.readFile.uncompress()
    for line in data.splitLines:
      count = count + 1
      if count mod 1000000 == 0:
        let lps = count / (getMonoTime() - startTime).inSeconds
        echo now(), " ", count, " ", lps
    
    echo getMonoTime() - startTime, " ", count
    
    
    Run

And on my test input this ran 2.26 times faster than the Python code (measured 
using hyperfine) when compiled with `-d:release`. There are other flags you can 
play with as well, but that's a bit of an advanced topic. I managed to push it 
to 2.75x though with some minor flag tweaking.

Reply via email to