Counting word frequencies with Nim

cblake Tue, 04 May 2021 16:45:04 -0700

To be clear, I didn't mean to be overcritical of you or your efforts -- good 
job stdlib tuning (and thanks!), but the general 1.25x faster in lang X than 
lang Y kind of subculture. For this particular case, I couldn't even reproduce 
ratios on various CPUs as accurately as their speed ratios on Ben Hoyt's table. 
So, are you measuring the language or the CPU or what? Mix in any uncontrolled 
cache competition, no repeated trials, no noise control, etc., etc. To me, that 
level of irreproducibility means you really just get a couple of clusters of 
languages which is fine as far as it goes, but then people just see one table, 
don't understand this, don't know enough about measurement to think critically 
and overinterpretation ensues. Even pros mess this up..routinely, even.


Anyway, sorry for the rant. I am not specifically attributing any of these 
problems to you/your post/work. I also analogized it [to 
RosettaCode](https://lobste.rs/s/3byl7t/performance_comparison_counting_words#c_hogcoz).
 I do see real value in that..even just style/API/language learning, but 
Rosetta Code itself has waaaay more languages on that score. :-)

FWIW, the `lowCaseWords` tokenizer with the KM def in that adix/test/wf.nim 
example is not _so_ complicated...Just a 13 line `iterator`..Easy to "port from 
other lang examples", but maybe just past "too tricky" in a job interview with 
all the other moving parts. In my experience, job interview questions evolve as 
you test them against people. You also get a lot of feedback from candidates. 
So, FNV1a is probably not a "random" choice, but email feedback from the 37th 
candidate who tried a slew of hash functions. It happens to work really well on 
that KJ text.. ;-)

Counting word frequencies with Nim

Reply via email to