unfortunately this code is not mine, but is rather simple to try it: int bloom_filter; for (char accent : accents ) { bloom_filter = bloom_filter | 1 << ( accent & 0x1F ); }
the rest is easy, this works well for 10-20 chars per bloom_filter, depends on distribution. you cold try it with long ... be careful with java options and different cpu-s, java does big steps in tweaking switch performance, cpu-s as well. We have seen wild diffs by changing jvm versions (-server -Xbatch) and cpu-s... ----- Original Message ---- > From: Andi Vajda <va...@osafoundation.org> > To: java-dev@lucene.apache.org > Sent: Friday, 30 January, 2009 23:02:15 > Subject: Re: BloomFilter-s with Lucene > > > On Fri, 30 Jan 2009, eks dev wrote: > > > I have used them for speeding up huge switch clauses in charset > > normalization > (eg lowercase and accent->plain form mapping). Big number of accented > characters > (this causes big switch statement) that appear seldom in corpus (big majority > being not accented). If negative test, you do just simple array access, if > positive do full work with hige switch statement. > > Interesting, this could be used with the fix to LUCENE-1390 then ? > > Andi.. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org