OK here what i've come up with - After reading your suggestions
- bit set from DB stays untouched
- only one field shall be used to store interest field bits in the document:
"interest". Saves disk space.
- The bits shall be not be converted to readable string but added as values
separated by space " "
====Code Below====
-----------------
public Document getDocument(int db_interest_bits)
{
String interest_string ="";
// sport
if (db_interest_bits & 1) {
interest_string +="1"+" "; // empty space as delimiter
}
// music
if (bitsfromdb & 2) {
interest_string +="2"+" "; // empty space as delimiter
}
Document doc = new Document();
doc.add("interest", interest_string);
// how do i tell Lucene to separate tokens on search ?
return doc;
}
---------------
FURTHERMORE - i realized that almost all potential values are often set
i.e.
sport music film
sport music
sport music film
sport music film
sport music
music
So i was thinking : How about doing the reverse when it comes to building
the index ?
I would onyl store the fields that are not set.
The search would be a negation.
Example Values ofd interest:
1. "no_film" => Only a film is not set
2. "no_sport no_film" => film and sport are not set
3. "" => all values are set since this is a negation
It follows, searching for people interested in music:
=> search for NOT no_music
QUESTION
How does the perfomance of a negative search NOT compare to a normal one
I.E.
"NOT no_music" vs "music" search under the premise that most interest flags
are set ?
---------
Daniel Noll-3 wrote:
>
> Erick Erickson wrote:
>> Well, you really have the code already <G>. From the top...
>>
>> 1> there's no good way to support searching bitfields If you wanted, you
>> could probably store it as a small integer and then search on it, but
>> that's
>> waaay too complicated than you want.
>>
>> 2> Add the fields like you have the snippet from, something like
>> Document doc = new Document.
>> if (bitsfromdb & 1) {
>> doc.add("sport", "y");
>> }
>> if (bitsfromdb & 2) {
>> doc.add("music", "y");
>> }
>
> Beware that if there are a large number of bits, this is going to impact
> memory usage due to there being more fields.
>
> Perhaps a better way would be to use a single "bits" field and store the
> words "sport", "music", ... in that field.
>
> Daniel
>
>
> --
> Daniel Noll
>
> Nuix Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699
> Web: http://nuix.com/ Fax: +61 2 9212 6902
>
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Searching-by-bit-masks-tf2603918.html#a7576286
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]