unfortunately this code is not mine, but is rather simple to try it:
int bloom_filter;
for (char accent : accents ) {
bloom_filter = bloom_filter | 1 << ( accent & 0x1F );
}
the rest is easy, this works well for 10-20 chars per bloom_filter, depends on
distribution. you cold try it with long ...
be careful with java options and different cpu-s, java does big steps in
tweaking switch performance, cpu-s as well. We have seen wild diffs by
changing jvm versions (-server -Xbatch) and cpu-s...
----- Original Message ----
> From: Andi Vajda <[email protected]>
> To: [email protected]
> Sent: Friday, 30 January, 2009 23:02:15
> Subject: Re: BloomFilter-s with Lucene
>
>
> On Fri, 30 Jan 2009, eks dev wrote:
>
> > I have used them for speeding up huge switch clauses in charset
> > normalization
> (eg lowercase and accent->plain form mapping). Big number of accented
> characters
> (this causes big switch statement) that appear seldom in corpus (big majority
> being not accented). If negative test, you do just simple array access, if
> positive do full work with hige switch statement.
>
> Interesting, this could be used with the fix to LUCENE-1390 then ?
>
> Andi..
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]