Nim `Table` and `HashSet` should be caching the hash code values in their 
tables and also using that as a "comparison prefix" (comparing string values 
only if the integer hash codes already match). The string hash of the new 
inputs is ineliminable - or rather has to be done at least once anyway.

My guess is the lines are long-ish (greater than 10 chars, say). The default 
Nim hash string function could be faster, especially for longer strings. E.g., 
the current default does byte-at-a-time manipulations and usually 
8-byte/word-at-a-time can get about as good bit mixing much faster, especially 
for long strings like lines in files. Such a replacement might take that 60% 
hashing time in this particular benchmark down to 10-15% for a 2X-ish overall 
speed up.

I have benchmarked C++ for these kinds of tests and I suspect the current STL 
would not be faster than Nim with the -d:release in this case. (C++, too could 
benefit from faster default string hash).

Reply via email to