RE: Multi Field search without Multifieldqueryparser

Dino Korah Tue, 23 Sep 2008 06:58:46 -0700

Just an idea... Along winded one. I'm not sure either.! Pardon me if I am
directing you in the wrong direction



If you add a lucene doc like below into your main index

- Doc 1 -
Field1: rainy today
Field2: rainy yesterday
Field3: weather forcast for tomorrow

- Doc 2 -
Field1: rainy tomorrow
Field2: rainy today
Field3: weather forcast for today


... etc


And if you create something like an inverted index like below

- Doc 1 -
Field: Field1
Value: rainy today

- Doc 2 -
Field: Field2
Value: rainy yesterday

- Doc 3 -
Field: Field3
Value: weather forcast for tomorrow

- Doc 4 -
Field: Field1
Value: rainy tomorrow

- Doc 5 -
Field: Field2
Value: rainy today

- Doc 6 -
Field: Field3
Value: weather forcast for today

And if you run a query on the inverted index to find out the field that is
most probably to match the text you are about to search for in the main
index, I have a feeling that this might work.



-----Original Message-----
From: Umesh Prasad [mailto:[EMAIL PROTECTED] 
Sent: 23 September 2008 13:58
To: java-user@lucene.apache.org
Subject: Re: Multi Field search without Multifieldqueryparser

On Tue, Sep 23, 2008 at 5:28 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:

> So, the piece I'm missing is how do you know what field for which terms.
>  In other words how do you know xyz goes against organization and abc 
> against name.  Your wording implies that you don't know this before 
> hand,


          I guess this would be the case. The free flowing text search leads
to this issue.


> yet you are somehow suggesting that Lucene should be able to do it.
>  Correct me if I'm wrong.

      I am not sure if Lucene will be able to directly able to do it.
However Indexed Terms in Lucene can certainly be used in learning the field
of a particular word/token.
      One way, would be Lucene Index can be traversed to generated a
Learning System which will be later used to learn the field name of a
particular system. I suggest traversing the termDocs and extracting out the
words and field information which can be stored in a separate DB/Index
(Learning System). This system can then be queried 1st to determine the
field type of word. The additional time that the Learning System will
require should be compensated by having a smaller Index Size.



Thanks
Umesh




>
> -Grant
>
>
>
> On Sep 23, 2008, at 6:51 AM, Anshul jain wrote:
>
>  Here is what I'm trying to do:
>>
>> say a lucene document:
>> name: abc ^10
>> organization: xyz ^3
>>
>> ^10 and ^3 are boosts in the document.
>>
>> now if I query name: abc ^5 AND organization: xyz this will work.
>>
>> but if I query (default_field): abc^5 AND xyz this won't work.
>>
>> Now what I want is that a text can be associated with more than one
field.
>> i.e.
>>
>> (field1,field2,field3):value
>> name,(default_field),title: abc^10
>> organization,(default_field),institute: xyz^3
>>
>> then both of my queries will work.
>>
>> Is it possible to do so in lucene without changing the source?
>> If no then can anyone please explain the indexing and searching 
>> mechanism for lucene, so that I can start working on it.
>>
>> The solution given by the java-users won't work for me as I do not 
>> want to add all the contents of the document in a single field and 
>> then search for that field, as this would increase the index size and 
>> I've to index more than 10 million documents. Also 
>> multifieldqueryparser will make it query execution inefficient, as 
>> there will be thousands of fields.
>>
>> If I start storing just a single field as: (default_field): "name abc 
>> organization xyz", then it is possible that some other documents 
>> might get selected that are not relevant. Also i want to boost 
>> individual fields in a document.
>>
>> Anshul
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Multi Field search without Multifieldqueryparser

Reply via email to