Hi there, Looking at my index (about 1M docs) i see lot of unique terms, more than 8M which is a significant part of my total term count. These are very likely useless terms, binaries or other meaningless numbers that come with few of my docs. I am totally fine with deleting them so these terms would be unsearchable. Thinking about it i get that 1. It is impossible apriori knowing if it is unique term or not, so i cannot add them to my stop words. 2. I have a performance decrease cause my cached "hot spot" chuncks (4kb) do contain useless data. It's a problem for me as im short on memory.
Q: Assuming a constant index, is there a way of deleting all terms that are unique from at least the dictionary tim and tip files? Do i need to enter the source code for this, and if yes what par of it? Will i get significant query time performance increase beside the better RAM use benefit? Are there any written updateProcessor classes that identify non human readable terms? Thanks in advance, Manu