> On Wed, Jun 10, 2009 at 3:43 PM, Michael McCandless > <luc...@mikemccandless.com> wrote: > > On Wed, Jun 10, 2009 at 3:19 PM, Yonik > Seeley<yo...@lucidimagination.com> wrote: > > > >>> And this information about the trie > >>> structure and where payloads are should be stored in FieldInfos. > >> > >> As is the case today, the info is encoded in the class you use (and > >> it's settings)... no need to add it to the index structure. In any > >> case, it's a completely different issue and shouldn't be tied to > >> TrieRange improvements. > > > > The problem is, because the details of Trie* at index time affect > > what's in each segment, this information needs to be stored per > > segment. > > That's the case with the analysis for every field. If you change your > analyzer in a non-compatible fashion, you need to re-index.
I agree with Mike to store information like the data type in the index, but on the other hand, Yonik is correct, too. If I change my analyzer (and TrieTokenStream is in fact one, an analyzer that creates tokens out of a number), I have to reindex. The problem with storing different indexing settings (precisionStep, payload/position bits) per segment makes merging nearly impossible, so I would not do this (see also Earwins comment about that). About releasing 2.9: I would really like to leave this optimization out for 2.9. We can still add this after 2.9 as an optimization. The number of bits encoded into the TermPosition (this is really a cool idea, thanks Yonik, I was missing exactly that, because you do not need to convert the bits, you can directly put them into the index as int and use them on the query side!) is simply 0 for indexes created with 2.9. With later versions, you could also shift the lower bits into the TermPosition and tell TrieRange to filter them. I would like to go forward with moving the classes into the right packages and optimize the way, how queries and analyzers are created (only one class for each). The idea from LUCENE-1673 to use static factories to create these classes for the different data types seems to be more elegant and simplier to maintain than the current way (having a class for each bit size). So I think I will start with 1673 and try to present something useable, soon (but without payloads, so the payload/position-bits setting is "0"). Now the oen question: Which name for the numeric range queries/fields? :-( Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org