Oh, and though you did not say/ask, since this kind of thing is often a little "test drive benchmark", I should say that besides compiling with `-d:danger`, you would be better off using the `split` iterator and another loop level/nest.
Beyond that, with `string` keys as you have here it is likely faster to use a `Table` with a `histo.mgetOrPut(word, 0).inc` since `Table` saves hash codes to compare before doing the final `string` comparison on a successful lookup/update or no `string` comparison at all on novel keys. Depending upon the shape of your distribution, `Table` could be dramatically faster, but those hash codes use a little space, too. So, if the shape is such that the `CountTable` can stay in L1/L2 cache, but regular `Table` cannot, it might not not win (or win by as much), for example. There are other things you can do, like using `cligen/mslice` stuff, but the above ideas are probably the lowest hanging fruit.