As @joppez says zippy beating Python above is not very surprising, since it 
just decompresses the file in a single pass after having read the entire file 
into memory. Streaming implementations tend to be slower (but of course use 
less memory, which is kind of the point).

As for why gzipfiles is slower than Python: it uses std/streams, which is 
difficult to use efficiently. For example, the default readLine implementation 
(which gzipfiles does not override) looks like:
    
    
    proc readLine*(s: Stream, line: var string): bool =
      # [...]
      line.setLen(0)
      while true:
        var c = readChar(s)
        if c == '\c':
          c = readChar(s)
          break
        elif c == '\L': break
        elif c == '\0':
          if line.len > 0: break
          else: return false
        line.add(c)
      result = true
    
    
    Run

readChar just calls readDataImpl with a buffer length of 1; that means 
gzipfiles will call a function _pointer_ (which then calls gzread) for every 
single byte of the file. If anything, I'm surprised it even gets close to 
Python's performance; the reason may be that zlib does some internal buffering, 
so at least the decompression algorithm isn't invoked for every single call. 

Reply via email to