The approach I am currently using is (pseudo code):

  select count(*) from docs 
      where date_modified > lastIndexRunDate

  if ((countChangedOrNew/reader.numDocs) >50%)
  {
         //quicker to rebuild the whole index
         wipeIndex;
         Select * from docs
         for (each record)
         {
           writer.addDoc(new Doc(record));
         }
  }
  else
  {
        //patch the data 

        //first delete any docs in index
         select id from docs where 
                date_modified>lastIndexRunDate
         for(each id)
         {
             reader.delete(new Term("dbkey",id);
          }
         reader.close

         //now add docs
         select * from docs where  
                  date_modified>lastIndexRunDate
       for (each record)
       {
           writer.addDoc(new Doc(record));
       }
  }
  save lastIndexRunDate;


We've found there are database-specific JDBC streaming
settings that help when reading huge volumes of
records.



--- N <[EMAIL PROTECTED]> wrote:

> Hi
> 
> I am indexing database tables with huge data via
> Lucene. Do I need to reindex  the whole table(s) as
> changes are made to keep the search up to date..?
> since it is time consuming to create new index every
> time from scratch when the data is modified in the
> tables, can anybody suggest some workaround for
> efficient method?
> 
> Thanks in advance
> Noon
> 
>               
> ---------------------------------
> Relax. Yahoo! Mail virus scanning helps detect nasty
viruses!



                
___________________________________________________________ 
Win a BlackBerry device from O2 with Yahoo!. Enter now. 
http://www.yahoo.co.uk/blackberry

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to