On Sat, Oct 30, 2010 at 7:01 AM, Earwin Burrfoot <ear...@gmail.com> wrote: > Mathematically an inverted index is keyed by strings. Any strings. > Empty term is just a case of a string of length 0. > So, for consistency, Lucene should support them. TermsEnum.seek("") > should position you into very beginning of terms list, etc. > If you drop the support, you have to check zero length damn > eeeeverywhere in the API where you accept terms. Or, thoroughly > document unpredictable erratic behaviour :)
well, we are checking this already, in a lot of the analyzers. as i said originally, the biggest problems that we *must* solve are: 1. try to prevent the performance trap i mentioned, where people create the empty term as a mega-stopword without realizing it. 2. fix the analyzers to be consistent with regards to the empty term... for example, if we decide the empty term is supported, then they shouldnt be arbitrarily removing empty-term tokens. as far as TermsEnum, i myself have already had to special-case the empty term in TermsEnum implementations before... and I'm pretty fucking sure that we have long-standing bugs if you have an empty-term anywhere in your index (e.g. FuzzyQuery will divide by 0 to scale the boost, and you will get a strange exception from your collector because it will then have NaN/Inf/some sentinel value). just saying, its problematic today, doing nothing and leaving it the messy unambiguous situation it is now is no option. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org