From: Mark Harwood [markharw...@yahoo.co.uk] > Good point, Toke. Forgot about that. Of course doubling the number > of hash algos used to 4 increases the space massively.
Maybe your hashing-idea could work even with collisions? Using your original two-hash suggestion, we're just about sure to get collisions. However, we are still able to uniquely identify the right document as the UID is also stored (search for the hashes, iterate over the results and get the UID for each). When an update is requested for an existing document, the indexer extracts the UIDs from all the documents that matches the hash. Then it performs a delete of the hash-terms and re-indexes all the documents that had "false" collisions. As the number of unique hash-values as well as hash-function can be adjusted, this could be a nicely tweakable performance-vs-space trade off. This will only work if it is possible to re-create the documents from stored terms or by requesting the data from outside of Lucene by UID. Is this possible with your setup, eks dev? --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org