I know a person who submitted Finish Analyzer,a looong time ago had mentioned this same thing - storing multiple variations of the word in the same position.
Otis --- Dmitry Serebrennikov <[EMAIL PROTECTED]> wrote: > Otis Gospodnetic wrote: > > >Ah, sorry about bringing up performance, I mixed that with another > >thread. > >Anyhow, I still think that setPosition offers a nice feature that > some > >people may want to use. It was on a to do list for a while, and it > was > >there because people requested it, so even though Lucene doesn't use > >setPosition internally, maybe Lucene-based apps out there are. > > > Most likely it would be analyzers for additional languages that would > > make use of this. One example where I have considered using this > feature > was in a special-purpose analyzer that placed multiple forms of a > token > into the same position. For example, a given word "10cm" can be > parsed > into two: "10", "cm". This would allow a document to be found when > the > query includes "10 cm" or "10cm". I ended up doing just this, but I > do > not currently bother with positions, only because I do not run phrase > > queries. However, if phrase queries were needed, I think I would > probably want to place them at the same position. > > Another example where this could be useful would be with languages > where > a single word can be composed of many component words - such as > German. > Perhaps it can also be useful in oriental languages? > > Dmitry. > > > > >Otis > > > > > >--- stephane vaucher <[EMAIL PROTECTED]> wrote: > > > > > >>I'm not sure if I understand your question. I'm not trying to > >>optimise > >>anything. This thread was spawned because the usage of Token was > >>unclear > >>and inconsistent (I don't see the purpose here a package scoped > >>members). The result of this is that a few of us thought that an > >>immutable Token might be clearer. > >> > >>The most simple change (I personally believe it's an essential > >>change) > >>is to make the members private. > >>The second change for the object to be immutable would be to remove > >>the > >>positionIncrement, but since I'm no lucene guru, I can't tell what > is > >> > >>better (hence the email). > >> > >>I'll test the simples changes tonight to see if there is a sizable > >>performance hit, and I'll wait to see if a guru speaks out about > the > >>controversial second change (which is also trivial). > >> > >>Stephane > >> > >>Otis Gospodnetic wrote: > >> > >> > >> > >>>It sounds to me that having the ability to do that that point 13. > in > >>>CHANGES states is more important than trying to only slightly > >>> > >>> > >>decrease > >> > >> > >>>the number of temporary objects instantiated. > >>> > >>>By the way, have you observed or measured the difference in > >>>performance, memory consumption or anything else, before and after > >>> > >>> > >>your > >> > >> > >>>local changes? > >>>Not having those and making Token immutable for performance > reasons > >>>would be wrong. > >>> > >>>Thanks, > >>>Otis > >>> > >>> > >>>--- stephane vaucher <[EMAIL PROTECTED]> wrote: > >>> > >>> > >>> > >>>>I've noticed that there is a method public void > >>>>setPositionIncrement(int > >>>>positionIncrement) that would probably have to disappear for > Token > >>>> > >>>> > >>to > >> > >> > >>>>be > >>>>immutable. The CHANGES.txt doc seems to mention some good reasons > >>>> > >>>> > >>why > >> > >> > >>>>it > >>>>was added, but there is no code in CVS that seems to depend on > it. > >>>> > >>>>From CHANGES: > >>>>13. Added new method Token.setPositionIncrement(). > >>>> > >>>> This permits, for the purpose of phrase searching, placing > >>>> multiple terms in a single position. This is useful with > >>>> stemmers that produce multiple possible stems for a word. > >>>> > >>>> This also permits the introduction of gaps between terms, so > >>>>that > >>>> terms which are adjacent in a token stream will not be > matched > >>>>by > >>>> and exact phrase query. This makes it possible, e.g., to > >>>> > >>>> > >>build > >> > >> > >>>> an analyzer where phrases are not matched over stop words > >>>> > >>>> > >>which > >> > >> > >>>> have been removed. > >>>> > >>>> Finally, repeating a token with an increment of zero can also > >>>> > >>>> > >>be > >> > >> > >>>> used to boost scores of matches on that token. (cutting) > >>>> > >>>>Any comments? With an immutable Token, does the positionIncrement > >>>>still > >>>>have a reason for being there? If not, then I'll remove > >>>>getPositionIncrement as well. > >>>> > >>>>Stephane > >>>> > >>>>Doug Cutting wrote: > >>>> > >>>> > >>>> > >>>>>stephane vaucher wrote: > >>>>> > >>>>> > >>>>> > >>>>>>1) Does anyone mind? Will it break anything? > >>>>>> > >>>>>> > >>>>>> > >>>>>It shouldn't break anything. > >>>>> > >>>>> > >>>>> > >>>>>>2) Are there units tests for this? (particularly > >>>>>> > >>>>>> > >>>>>> > >>>>PorterStemFilter). > >>>> > >>>> > >>>> > >>>>>>The changes are obviously not spectacular, but I prefer not to > >>>>>> > >>>>>> > >>>>>> > >>>>screw > >>>> > >>>> > >>>> > >>>>>>everyone up... > >>>>>> > >>>>>> > >>>>>> > >>>>>I don't know of any unit tests specifically for this. Mostly > this > >>>>> > >>>>> > >>>>>change will affect compilation. In general though, if you don't > >>>>> > >>>>> > >>>>> > >>>>see > >>>> > >>>> > >>>> > >>>>>unit tests for things that you think you might break, then it > >>>>> > >>>>> > >>never > >> > >> > >>>>>hurts to write more unit tests. > >>>>> > >>>>> > >>>>> > >>>>>>3) I've checked-out the latest version of lucene, is there > >>>>>> > >>>>>> > >>>>>> > >>>>anything > >>>> > >>>> > >>>> > >>>>>>special I need to do if I get the go ahead to check my stuff in > >>>>>> > >>>>>> > >>>>>> > >>>>(like > >>>> > >>>> > >>>> > >>>>>>a dev list review)? > >>>>>> > >>>>>> > >>>>>> > >>>>>If you're not a regular committer then please send diffs to > >>>>> > >>>>> > >>>>> > >>>>lucene-dev > >>>> > >>>> > >>>> > >>>>>before comitting and give folks a few days to consider the > >>>>> > >>>>> > >>changes. > >> > >> > >>>>>Doug > >>>>> > >>>>> > >>>>>-- > >>>>>To unsubscribe, e-mail: > >>>>><mailto:[EMAIL PROTECTED]> > >>>>>For additional commands, e-mail: > >>>>><mailto:[EMAIL PROTECTED]> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>-- > >>>>To unsubscribe, e-mail: > >>>><mailto:[EMAIL PROTECTED]> > >>>>For additional commands, e-mail: > >>>><mailto:[EMAIL PROTECTED]> > >>>> > >>>> > >>>> > >>>__________________________________________________ > >>>Do you Yahoo!? > >>>Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > >>>http://mailplus.yahoo.com > >>> > >>>-- > >>>To unsubscribe, e-mail: > >>> > >>> > >><mailto:[EMAIL PROTECTED]> > >> > >> > >>>For additional commands, e-mail: > >>> > >>> > >><mailto:[EMAIL PROTECTED]> > >> > >> > >>> > >>> > >> > >>-- > >>To unsubscribe, e-mail: > >><mailto:[EMAIL PROTECTED]> > >>For additional commands, e-mail: > >><mailto:[EMAIL PROTECTED]> > >> > >> > >> > > > > > >__________________________________________________ > >Do you Yahoo!? > >Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > >http://mailplus.yahoo.com > > > >-- > >To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > >For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>