> > but because of the cost of preparing the inputs (i.e. text > extraction) to Lucene. >
You're right ! That and also the cost of fetching the document, in systems where the content lives on other servers/systems. Reindexing is usually (depends on your analysis chain) the cheapest step. Shai On Tue, May 11, 2010 at 7:22 AM, Babak Farhang <farh...@gmail.com> wrote: > >> My take on it is that if someone wants to update the catch-all field, > then > >> reindexing the document may not be such a bad idea anyway. The purpose > of > >> those incremental updates is to cope w/ high frequency of updates, which > >> usually happen on metadata fields, and not title. > > > > I agree. > > I too agree with the general gist of this argument. > > As an aside, just to add another dimension to this discussion (perhaps > now the net is cast too wide), Lucene users often want incremental > updates not because of the cost of reindexing the document inside > Lucene, but because of the cost of preparing the inputs (i.e. text > extraction) to Lucene. > > > On Mon, May 10, 2010 at 2:40 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: > > On Mon, May 10, 2010 at 4:05 AM, Shai Erera <ser...@gmail.com> wrote: > >> That's an interesting scenario Mike. > >> > >> Previously, I only handled boolean-like terms, as the scenarios we were > >> asked to support involved just those types of terms. Obviously, when the > >> approach allows for more, more scenarios pop to mind :). > > > > OK. > > > >> I think we may still be able to resolve that case, but it becomes much > more > >> complicated. My design approach of adding the +/- affected the entire > >> posting element, whereas the scenario you describe affects the positions > of > >> the posting element. This calls for a more complicated design and > solution. > > > > Right. > > > >> My take on it is that if someone wants to update the catch-all field, > then > >> reindexing the document may not be such a bad idea anyway. The purpose > of > >> those incremental updates is to cope w/ high frequency of updates, which > >> usually happen on metadata fields, and not title. > > > > I agree. > > > >> But since one could add the 'tags' to the catch-all field as well, it > brings > >> us to the same point - how do I remove the positions of term X that > relate > >> to the tag X and not the potentially original term X that existed in the > >> document? > >> > >> This is a very advanced case (and interesting). I don't want to hold up > the > >> discussion on it, but want to make sure we do not deviate from getting > the > >> more simpler cases in first. Depending on the API, this might be very > easy > >> to solve, but might also complicate matters. Maybe, for a > >> incr-field-updates-v1, we can do without it? > > > > Definitely, let's take this (incrementally updating the positions as > > well) out of scope for the first cut, when we actually start building > > things. One simple way to do this might be to only allow incremental > > update on fields that have omitTFAP=true. > > > > When brainstorming/designing a new feature, I like to cast a wide net > > during the discussion/thinking (what we are doing now), but then when > > it comes to what to actually build for phase one well pull it way back > > in and aim for baby steps / progress not perfection. We are able to > > do much more imagining than we can actually writing code :) > > > > The wide net during brainstorming gives us a better view/context of > > the road ahead, eg to validate that the baby step is in the right > > direction, so that it doesn't preclude other things we might imagine > > later. > > > > In this case, it does sound like the approach should work (in theory) > > fine w/ positions, too. > > > > Mike > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >