Hi, Lucene stores the string because it may need it to run prefix or range queries. We don't have a hash-based terms dictionary right now but I know some people wrote one since they don't need support for these queries, see for instance the Earlybird paper[1]. Then if you can find a perfect hashing function, you can just replace your terms by their hash.
[1] http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.pdf -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org