On Thu, Dec 17, 2009 at 05:03:11PM +0100, Toke Eskildsen wrote: > A third alternative would be to count the number of unique datetime-values > and make a compressed representation, but that would make the creation of > the order-array more complex.
This is what we're doing in Lucy/KinoSearch, though we prepare the ordinal arrays at index time and mmap them at search time. The provisional implementation has a problem, though. We use a hash set to perform uniquing, but when there are a lot of unique values, the memory requirements of the SortWriter go very high. Assume that we're sorting string data rather than date time values (so the memory occupied by all those unique values is significant). What algorithms would you consider for performing the uniquing? Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org