Re: [CLucene-dev] Finding a specific document in the index

Michel Nadeau Mon, 14 Sep 2009 10:33:39 -0700

Hi Isodor!

Thanks for the help!

Would you mind giving me a very quick example about how to use theBooleanQuery with 2 fields?


Thank you very much,

Mike

clucene-developers@lists.sourceforge.net wrote:

---- DO NOT EDIT BELOW THIS LINE----------------------------------------------To reply to this message, simply hit reply or click here<http://www.myemaildesk.com/?a=3&user_id=0&thread_id=97089&operator_id=147&token=37F80678011FA7FE9BA2DA6973501BCF>



*Follow-up by* Sep 14, 2009 11:42 AM

Hi Mike,

> All records have a "tag" field containing the word "tag", that's whyI'm

> searching for tag:tag (to find all records and then apply filters).
>

> Is there a better way to do this? Because that way, it's so slowthat we

> can't consider using that system in production.
>

Searches for a term matching all documents is almost always the worst
solution. In your case, it leads to first retrieving all documents,
and checking if they match the filter one-by-one. A more efficient
approach (and one I'm using successfully myself) is to run a
BooleanQuery to match the two index fields PK and TYPE, and taking out
tag:tag from the query altogether. This way, the indexing on the two
terms will be used, so no CPU cycles at all will be used for documents
matching neither PK nor TYPE.

Best regards,

Isidor

------------------------------------------------------------------------------

Let Crystal Reports handle the reporting - Free Crystal Reports 200830-Daytrial. Simplify your report design, integration and deployment - andfocus on

what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers



*Opened by* Sep 14, 2009 9:37 AM

Hi guys!


We are building a system using only Lucene (no MySQL). The system is
analyzing A LOT of input files and inserts a dozen of types of records
in the index. Each record type has at least a PK and a TYPE field -
which makes it unique. Our problem is that the records are not unique
from file to file so each time we need to insert a document in the
index, first we need to check if the record is already in the index or
not (using PK and TYPE) - and this is VERY slow (the way I did it, at
least). Here's the code I use to search the index to verify if my
record's already there:

//*****
// Prepare filters
//*****

// Initialize
Filter *cluCF = NULL;
Filter *cluFilters[3]; // 2+1

// Type
strcpy(sFieldName, "TYPE");
strcpy(sFieldValue, sTmpType);
STRCPY_AtoT(tField, sFieldName, sizeof(sFieldName));
STRCPY_AtoT(tValue, sFieldValue, sizeof(sFieldValue));
cluFilters[0] = _CLNEW QueryFilter(QueryParser::parse(tValue,
tField, &cluKwdAn));

// PK
strcpy(sFieldName, "PK");
strcpy(sFieldValue, sTmpPK);
STRCPY_AtoT(tField, sFieldName, sizeof(sFieldName));
STRCPY_AtoT(tValue, sFieldValue, sizeof(sFieldValue));
cluFilters[1] = _CLNEW QueryFilter(QueryParser::parse(tValue,
tField, &cluKwdAn));

// Null terminator
cluFilters[2] = NULL;

// Combine filters
cluCF = _CLNEW ChainedFilter(cluFilters, ChainedFilter::AND);

//*****
// Find document
//*****

// Prepare query
sprintf(sTmp, "tag:tag", sTmpPK);
STRCPY_AtoT(tField, sTmp, sizeof(sTmp));

// Search
cluQuery = QueryParser::parse(tField, _T("content"), &cluStdAn);
cluHits = cluSearcher->search(cluQuery, cluCF);

// Document found in the index?
if ( cluHits->length() == 0 )
{
// Insert document...
...


All records have a "tag" field containing the word "tag", that's why I'm
searching for tag:tag (to find all records and then apply filters).

Is there a better way to do this? Because that way, it's so slow that we
can't consider using that system in production.

Thank you very much,

Mike



        

        
*Message Center*

        

        
-- 3 messages from this user still open:

1. Re: [CLucene-dev] new to clucene<http://www.myemaildesk.com/?account_id=3&operator_id=147&thread_id=96879&user_id=0&token=BF07D96E781BB4C865DB37AB4EECAE66>2. Re: [CLucene-dev] new to clucene<http://www.myemaildesk.com/?account_id=3&operator_id=147&thread_id=96906&user_id=0&token=ACCF746DC1551A33BA5DF64A2606EFF2>3. [CLucene-dev] Finding a specific document in the index<http://www.myemaildesk.com/?account_id=3&operator_id=147&thread_id=97089&user_id=0&token=37F80678011FA7FE9BA2DA6973501BCF>


        

        
        



------------------------------------------------------------------------

        
*Ticket   Information*

        

        
Opened by:      
        clucene-developers@lists.sourceforge.net        
        Date Opened:    
        14th September

Assigned to:    
        Michel Nadeau

Last Message:

clucene-developers@lists.sourceforge.net on 14th September View All<http://www.myemaildesk.com/?a=3&t=97089&o=147&k=37F80678011FA7FE9BA2DA6973501BCF&u=0>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Re: [CLucene-dev] Finding a specific document in the index

Reply via email to