Re: who clears attributes?

Michael Busch Wed, 12 Aug 2009 00:13:22 -0700

+1. We don't use Solr, but have quite a bunch of medium and
short-sized documents. Plus heaps of metadata fields.


I'm yet to read Uwe's example, but I feel I'm a bit misunderstood by


Did you read it yet? What do you think about it?

some of you. My gripe with new API is not that it brings us troubles
(which are solved one way or another), it is that the switch and
associated migration costs bring zero benefits in immediate and remote
future.
The only person that tried to disprove this claim is Uwe. Others
either say "the problems are solved, so it's okay to move to the new
API", or "this will be usable when flexindexing arrives". Sorry, the
last phrase doesn't hold its place, this API is orthogonal to
flexindexing, or at least nobody has shown the opposite.

If the API is orthogonal to flexible indexing or not depends on how youdefine "flexible indexing". I admit the term is vague and probablynowhere clearly defined.

I agree that if flexible indexing means to only change the encoding,i.e. *how* data is stored, e.g. PFOR vs. the current posting format,then yes, we don't need the new TokenStream API for it.

But the goals we have with flexible indexing are more than that. We wantto allow customizing *what* data is stored in the inverted index. Thevery first discussion about flexible indexing that happened severalyears ago you can find in the wiki:http://wiki.apache.org/lucene-java/FlexibleIndexing.

Already in this very early proposal it was suggested to have thefollowing posting formats as a start:

a. <doc>+
b. <doc, boost>+
c. <doc, freq, <position>+ >+
d. <doc, freq, <position, boost>+ >+

For d. you need to change the TokenStream API. How else can we get theboost from the source to the indexer. Of course you can always serializethe additional data into the payload byte array, but if filters want todo something with it performance suffers. The new API solves thisproblem very nicely. When we open the posting format like this peoplewill want to store different custom things in there. The new TokenStreamAPI is prepared for that - the old one isn't.


 Michael

So, what I'm arguing against is adding some code (and forcing users to
migrate) just because we can, with no other reasons.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: who clears attributes?

Reply via email to