On 2010-05-13 23:27, Israel Ekpo wrote: > Hello Lucene and Solr Community > > I have a custom org.apache.lucene.search.Filter that I would like to > contribute to the Lucene and Solr projects. > > So I would need some direction as to how to create and ISSUE or submit a > patch. > > It looks like there have been changes to the way this is done since the > latest merge of the two projects (Lucene and Solr). > > Recently, some Solr users have been looking for a way to perform bitwise > operations between and integer value and some fields in the Index > > So, I wrote a Solr QParser plugin to do this using a custom Lucene Filter. > > This package makes it possible to filter results returned from a query based > on the results of a bitwise operation on an integer field in the documents > returned from the pre-constructed query.
Hi, What a coincidence! :) I'm working on something very similar, only the use case that I need to support is slightly different - I want to support a ranked search based on a bitwise overlap of query value and field value. That is, the number of differing bits would reduce the score. This scenario occurs e.g. during near-duplicate detection that uses fuzzy signatures, on document- or sentence levels. I'm going to submit my code early next week, it still needs some polishing. I have two ways to execute this query, neither of which uses filters at the moment: * method 1: during indexing the bits in the fields are turned into on/off terms on the same field, and during search a BooleanQuery is formed from the int value with the same terms. Scoring is courtesy of BooleanScorer. This method supports only a single int value per field. * method 2, incomplete yet - during indexing the bits are turned into terms as before, but this method supports multiple int values per field: terms that correspond to bitmasks on the same value are put at the same positions. Then a specialized Query / Scorer traverses all 32 posting lists in parallel, moving through all matching docs and scoring according to how many terms matched at the same position. I wrapped this in a Solr FieldType, and instead of using a custom QParser plugin I simply implemented FieldType.getFieldQuery(). It would be great to work out a convenient user-level API for this feature, both the scoring and the non-scoring case. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
