unfortunately this code is not mine, but is rather simple to try it:

int bloom_filter;
for (char accent : accents  ) { 
            bloom_filter = bloom_filter | 1 << ( accent & 0x1F );            
        }


the rest is easy, this works well for 10-20 chars per bloom_filter, depends on 
distribution. you cold try it with long ...

be careful with java options and different cpu-s, java does big steps in 
tweaking switch performance, cpu-s as well.   We have seen wild diffs by 
changing jvm versions (-server -Xbatch) and cpu-s... 


----- Original Message ----
> From: Andi Vajda <va...@osafoundation.org>
> To: java-dev@lucene.apache.org
> Sent: Friday, 30 January, 2009 23:02:15
> Subject: Re: BloomFilter-s with Lucene
> 
> 
> On Fri, 30 Jan 2009, eks dev wrote:
> 
> > I have used them for speeding up huge switch clauses in charset 
> > normalization 
> (eg lowercase and accent->plain form mapping). Big number of accented 
> characters 
> (this causes big switch statement) that appear seldom in corpus (big majority 
> being not accented). If negative test, you do just simple array access, if 
> positive do full work with hige switch statement.
> 
> Interesting, this could be used with the fix to LUCENE-1390 then ?
> 
> Andi..
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to