RE: indexing_slowdown_with_latest_lucene_udpate

Uwe Schindler Mon, 10 Aug 2009 08:11:39 -0700

I already started to prepare a patch... Let's open an issue! You could try
it out with your corpus and post numbers.


There are some additional slowdowns with the new API if you do not reuse
TokenStreams, as the setup of the Attribute maps is an additional small
cost.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Mark Miller [mailto:[email protected]]
> Sent: Monday, August 10, 2009 5:08 PM
> To: [email protected]
> Subject: Re: indexing_slowdown_with_latest_lucene_udpate
> 
> My bet is that that would still be much faster - uncontentious sync is
> generally very fast and the check method call is extremely slow.
> 
> - Mark
> 
> Uwe Schindler wrote:
> > The question is, if that would get better if the reflection calls are
> only
> > done one time per class using a IdentityHashMap<Class,Boolean>. The
> other
> > reflection code in AttributeSource uses a static cache for such type of
> > things (e.g. the Attribute -> AttributeImpl mappings in AttributeSource.
> > DefaultAttributeFactory.getClassForInterface()).
> >
> > I could do some tests about that and supply a patch. I was thinking
> about
> > that but throwed it away (as it needs some synchronization on the cache
> Map
> > which may also overweigh).
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: [email protected]
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller [mailto:[email protected]]
> >> Sent: Monday, August 10, 2009 4:48 PM
> >> To: [email protected]
> >> Subject: Re: indexing_slowdown_with_latest_lucene_udpate
> >>
> >> Robert Muir wrote:
> >>
> >>> This is real and not just for very short docs.
> >>>
> >> Yes, you still pay the cost for longer docs, but it just becomes less
> >> important the longer the docs, as it plays a smaller role. Load a ton
> of
> >> one term docs, and it might be 50-60% slower - add a bunch of articles,
> >> and it might be closer to 20%-15% (I don't know the numbers, but the
> >> longer I made the docs, the less % slowdown, obviously). Still a good
> hit,
> >> but a short doc test magnafies the problem.
> >>
> >> It affects things no matter what, but when you don't do much
> tokenizing,
> >> normalizing, the cost of the reflection/tokenstream init dominates.
> >>
> >> - Mark
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: indexing_slowdown_with_latest_lucene_udpate

Reply via email to