Sure enough I did this [here](https://github.com/nim-lang/Nim/pull/16055) 
(yeah, I know tests are failing at the moment..) and now (with a gcc-10.2 PGO 
build on Skylake) this simple program runs that benchmark over 3x faster (to 
within 1.4x of the SIMD optimized C++ (on demand), within 10% of the "D fast" 
and faster than all the Rust): 
    
    
    import parsejson
    
    type Coordinate = tuple[x: float, y: float, z: float]
    
    proc calc(path = "/tmp/1.json"): Coordinate =
      var x = 0.0
      var y = 0.0
      var z = 0.0
      var n = 0
      for t in jsonTokens(path, false, false, false):
        if t.tok == tkFloat:  # valid float
          case t.a            # last string (strFloats,Ints)
          of "x": x = x + t.f # a[0] is actually slower!
          of "y": y = y + t.f
          of "z": z = z + t.f; n.inc
          else: discard
      result = (x: x/float(n), y: y/float(n), z: z/float(n))
    
    echo calc()
    
    
    Run

To be honest, all that SIMD effort is nice and all, but 1.4x isn't _that_ huge 
a boost. Not duplicating work gets you almost all the way there.

Parallelization as @mratsim mentioned could get much more. Unlike xsv/csv, 
however, json has this fluid hierarchical structure. So, in the general case it 
is likely a lot of figuring out how to divide work. Maybe not worth it. Truly 
big data usually has a regular format anyway.

Reply via email to