Sounds like you want a `Table[string, WordNumber]` paired with a `seq[string]` where `WordNumber` can be a `uint32` or `uint16` depending upon the size of the vocabulary under consideration.
If you want to do even more memory optimization `lptabz` in [adix](https://github.com/c-blake/adix) can help (e.g. by removing the hash code caching @shirleyquirk mentioned or by providing a `sequint` for bit-granularity arrays of numbers).