Hi guys!

We are building a system using only Lucene (no MySQL). The system is analyzing A LOT of input files and inserts a dozen of types of records in the index. Each record type has at least a PK and a TYPE field - which makes it unique. Our problem is that the records are not unique from file to file so each time we need to insert a document in the index, first we need to check if the record is already in the index or not (using PK and TYPE) - and this is VERY slow (the way I did it, at least). Here's the code I use to search the index to verify if my record's already there:

   //*****
   // Prepare filters
   //*****

   // Initialize
   Filter *cluCF = NULL;
   Filter *cluFilters[3]; // 2+1

   // Type
   strcpy(sFieldName, "TYPE");
   strcpy(sFieldValue, sTmpType);
   STRCPY_AtoT(tField, sFieldName, sizeof(sFieldName));
   STRCPY_AtoT(tValue, sFieldValue, sizeof(sFieldValue));
cluFilters[0] = _CLNEW QueryFilter(QueryParser::parse(tValue, tField, &cluKwdAn));

   // PK
   strcpy(sFieldName, "PK");
   strcpy(sFieldValue, sTmpPK);
   STRCPY_AtoT(tField, sFieldName, sizeof(sFieldName));
   STRCPY_AtoT(tValue, sFieldValue, sizeof(sFieldValue));
cluFilters[1] = _CLNEW QueryFilter(QueryParser::parse(tValue, tField, &cluKwdAn));

   // Null terminator
   cluFilters[2] = NULL;

   // Combine filters
   cluCF = _CLNEW ChainedFilter(cluFilters, ChainedFilter::AND);

   //*****
   // Find document
   //*****

   // Prepare query
   sprintf(sTmp, "tag:tag", sTmpPK);
   STRCPY_AtoT(tField, sTmp, sizeof(sTmp));

   // Search
   cluQuery = QueryParser::parse(tField, _T("content"), &cluStdAn);
   cluHits  = cluSearcher->search(cluQuery, cluCF);

   // Document found in the index?
   if ( cluHits->length() == 0 )
   {
       // Insert document...
   ...


All records have a "tag" field containing the word "tag", that's why I'm searching for tag:tag (to find all records and then apply filters).

Is there a better way to do this? Because that way, it's so slow that we can't consider using that system in production.

Thank you very much,

Mike
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to