>>should be a stupid simple postings format like any other postings format with 
>>a default configuration

It does have a default config. It just needs a PF delegate in the constructor 
just like Pulsing....
Like Rob said:
>>In other words, it should work just like pulsing.


So far so good.

Now where people are getting upset (for no particularly good reason in my view) 
around per-field stuff:  if you really, really want to you can supply a 
subclass of BloomFilterFactory to your BloomPF constructor which allows 
customised control over choice of hashing algo, bitset sizing and saturation 
policies if the DefaultBloomFilterFactory fails to make the right choices.  
99.99999% of people will not do this. The reason it is a factory object and not 
some dumb settings is that it is called on a per-segment basis with state info 
that is useful context in making sizing choices.  Now, (horror of horrors), the 
factory's API is passed a FieldInfo object in the method designed to produce a 
bitset. It is conceivable that some rogue agents could choose to implement some 
per-field decisions here if the same BloomPF instance was registered to handle 
>1 field. In addition, BloomPF has some common-sense defensive coding that 
checks if the factory returns null
 for the bitset - in which case it delegates all calls un-bloomed directly to 
the delegate codec. 

None of this prevents the use of BloomPF with the prescribed PerFieldPF manner 
for handling field-specific choices.

I happen to use a custom BloomFilterFactory to implement a more efficient 
indexing pipeline than the prescribed PerFieldPF route of implementing all 
per-field policies "up high" in the stack -  but none of that is at the cost of 
a clean BloomPF API or with any unnecessary duplication of PerFieldPF logic. 

If anything needs changing here there may be a case for providing a convenience 
class that weds BloomPF and a default choice of Lucene40 codec so it can help 
with whatever Solr and other config-driven engines may need ie  zero arg 
constructors if that's how their registry of codecs works.

Cheers
Mark












________________________________
 From: Uwe Schindler <u...@thetaphi.de>
To: dev@lucene.apache.org 
Sent: Wednesday, 13 February 2013, 16:47
Subject: RE: New Lucene features and Solr indexes
 
Hi Shawn,

I was arguing also at the time when this was committed. I fully agree with 
Robert, the current API is not in a good shape!
I have the same feeling: Bloom Postings should be a stupid simple postings 
format like any other postings format with a default configuration. If you 
really want to change its configuration, you can subclass it as a separate 
postings format.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Wednesday, February 13, 2013 3:59 PM
> To: dev@lucene.apache.org
> Subject: Re: New Lucene features and Solr indexes
> 
> >> BloomFilterPostingsFormat is a little special compared to other
> >> postings formats because it can wrap any postings format. So maybe it
> >> should require special support, like an additional attribute in the
> >> field type definition?
> >
> > -1
> >
> > Instead of making other APIs to accomodate BloomFilter's current
> > brokenness: remove its custom per-field logic so it works with
> > PerFieldPostingsFormat, like every other PF.
> >
> > In other words, it should work just like pulsing.
> >
> > I brought this up before it was committed, and i was ignored. Thats
> > fine, but I'll be damned if i let its incorrect design complicate
> > other parts of the codebase too. I'd rather it continue to stay
> > difficult to integrate and continue walking its current path to an
> > open source death instead.
> 
> Robert,
> 
> I have to send you a general thank you for your dedication to the quality of
> this project, and for your amazing ability to seemingly keep the entire design
> for Lucene in your head at all times.
> 
> I'm not sure what exactly you want to die here, or what you think would be
> the best option for me, the Solr end-user.  Is BloomFilter something that's
> not worth pursuing, or would you just like it to be integrated in a different
> way?
> 
> Thanks,
> Shawn
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to