Well, you really have the code already <G>. From the top...
1> there's no good way to support searching bitfields If you wanted, you could probably store it as a small integer and then search on it, but that's waaay too complicated than you want. 2> Add the fields like you have the snippet from, something like Document doc = new Document. if (bitsfromdb & 1) { doc.add("sport", "y"); } if (bitsfromdb & 2) { doc.add("music", "y"); } . . . IndexWriter.add(doc); Now, when searching, search on things like new Term("sport", "y"))..... and you'll only get the documents that correspond to the 2s bit being set. Watch out for capitalization. Y may not be equivalent to y. It depends on the analyzer you use at index AND search time. You can or as many of these together as you want. In your example, you could have up to 4 sub-clauses just for the bitmask-equivalents. NOTE: the documents won't all have the same fields. A document may not have, for instance, the "sports" field. This is OK in Lucene, but not the first thing folks with their DB hat on think of.... Get a copy of Luke (google lucene luke) and get familiar with it for examining your index and the effects of various analyzers. Really, really, really get a copy of Luke. Really. Do you have a copy of "Lucene In Action"? If not, I highly recommend it. It has tons of useful examples as well as a good introduction to many of the concepts. It's written to the 1.4 codebase, so be warned that there are some incompatibilities that are, for the most part, minor. Best Erick On 11/27/06, Biggy <[EMAIL PROTECTED]> wrote:
i have the same problem here. I have an interest bit field, which i receive from the applciation backend. I have control over how the docuemtns are built. To be specific, the field looks like this: ID: interest 1 : sport 2 : music 4 : film 8 : clubs So someone interested in sports and music can be found by "interest & 3" => e.g. when using SQL. I do not wish to Post-Filter the results On to Lucene, Is there a filter which supports this kind of query ? Someone suggested splitting the bits into fields: > Document doc = new Document(); > doc.add("flag1", "Y"); > doc.add("flag2", "Y"); > IndexWriter.add(doc); Is this helpful at all ? Code would be helpful too as i am a newbie ltaylor.employon wrote: > > Hello, > > I am currently evaluating Lucene to see if it would be appropriate to > replace my company's current search software. So far everything has been > looking great, however there is one requirement that I am not too > certain about. > > What we need to do is to be able to store a bit mask specifying various > filter flags for a document in the index and then search this field by > specifying another bit mask with desired filters, returning documents > that have any of the specified flags set. In other words, we are doing a > bitwise OR on the stored filter bit mask and the specified filter bit > mask and if it is non-zero, we want to return the document. > > Before I started toying around with various options myself, I wanted to > see if any of you good folks in the Lucene community had some > suggestions for an efficient way to implement this. > > We currently need to index ~8,000,000 documents. We have several filter > flag fields, the most important of which currently has 7 possible flags > with any combination of the flags being valid. The number of flags is > expected to increase rather rapidly in the near future. > > My preemptive thanks for your suggestions, > > > Lawrence Taylor > Senior Software Engineer > Employon > Message was edited by: ltaylor.employon > > > -- View this message in context: http://www.nabble.com/Searching-by-bit-masks-tf2603918.html#a7564237 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]