@dlesnoff

> I believe this article might convince many procedural persons, but I see no 
> functional programming languages in there.

Well, the Nim version could be turned into something that looks more functional 
if more variables were defined to be immutable with `let`'s instead of `var`'s 
and more work were done to avoid mutable global state, which then could be 
fairly easily translated into a functional language.

The question would be which functional language to use:

  1. F# wouldn't look all that much different than the above transformed Nim 
code other than using recursive functions to represent loops and would likely 
run at about the same speed as the C# code without SIMD instructions other than 
perhaps the "Hot Loop" could be improved slightly (just as it could perhaps be 
done in C#) with the advantage over C# in that it is likely to be more concise 
(just as is the Nim code). I think F#/C# would need to have an adjustment if 
multi-threading were used in that buffers would need to be pre-allocated per 
thread on the heap for the "avg_ev" function instead of the stack-based arrays 
used in C/Nim and then accessing them as indexed by the "threadid".
  2. Other somewhat functional languages like Scala are likely to fall into the 
same speed range as C#/F# above.
  3. The only purely functional language that has a chance of coming into the 
range of speed of C/Nim would be Haskell using mutable unboxed arrays, although 
I wouldn't expect it to be as fast as the C/Nim code since it doesn't have auto 
SIMD vectorization and the limited SIMD-based primitives it does have don't 
seem to fit the use case (no bitwise SIMD operations or SIMD gather 
operations). The allocation of garbage-collected buffers per call to the 
"avg_ev" function would also need to be compensated for as for F#/C# as 
mentioned in point 1 above for multi-threading. Also, I'm not sure that the 
multi-threading runtime will be able to handle multi-threading as well as the 
low level C/Nim multi-threading used due to the fineness of the multi-threading 
time slices used - an average of less than a millisecond to process each 
"chunk". That said, I might take a crack at it to see where one might end up.


Reply via email to