There have been a few small comments in the Jira about the reflection in Snowball's Among class. There is very little to do about this unless one want to redesign the stemmers so they include an inner class that handle the method callbacks. That's quite a bit of work and I don't even know how much CPU one would save by doing this.

So I was thinking maybe it would save a some resources if one reused the stemmers instead of reinstantiating them, which I presume everybody does.

I thought it would make most sense to simulate query time stemming so my benchmark contained 4 words where 2 of them are plural. Each test ran 1 000 000 times. The amount of CPU time used is bearly noticeable relative to what other things cost: 0.0109ms/iteration when reinstantiating, 0.0067ms/iteration when reusing.

The heap consuption was however rather different. At the end of reinstantiation it had consumed about 10x more than when reusing. ~20MB vs. ~2MB.


I realize people don't usally run 1 000 000 queries in so short time, but at least this is an indication that one could save some GC time here. Many a mickle makes a muckle...

So I was thinking that perhaps it would make sense with something like a singleton concurrent queue in the SnowballFilter and a new constructor that takes the snowball program implementation class as an argument.

But this might also be way premature optimization.


         karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to