Hey there, you might have to implement a some kind of unique identifier using an indexed lucene field. When you are indexing you should fire a query with the uuid of your document (maybe the path to you pdf document) and check if the document is in the index already. You could also do a boolean query combining UUID, timestamp and / or a hash value to see if the document has been changed. if so you can simply update the document by its UUID (something like indexwriter.updateDocument(new Term("uuid", value),document);)
Unfortunately you have to implement this yourself but it should not be that much of a deal. simon On Mon, May 3, 2010 at 9:21 AM, Vijay Veeraraghavan < vijay.raghava...@gmail.com> wrote: > Dear all, > I am using lucene 3.0 to index the pdf reports that I generate > dynamically. I index the pdf file name (without extension), file path > and its absolute path as fields. I search with the file name without > extension; it retrieves a list, as usually 2 or more files are present > in the same name in different sub directories. As I create the index > for the first time it updates, assuming 100 pdf files in different > directories, the files meta info. If again I do indexing, while my > report generator scheduler has the produced 500 more pdf files > totaling to 600 files in different directories, I wish to index only > the new files to the index. But presently it’s doing the whole thing > again (600 files). How to implement this functionality? Think of the > thousands of pdf files created on each run. > > P.S: I cannot keep the meta-info of generated pdf files in the java > memory, as it exceeds thousands in a single run, and update the index > looping this list. > > new IndexWriter(FSDirectory.open(this.indexDir), new StandardAnalyzer( > Version.LUCENE_CURRENT), true, > IndexWriter.MaxFieldLength.LIMITED); > > is the boolean parameter is for this purpose? Please guide me. > > -- > Thanks > Vijay Veeraraghavan > > > > -- > Thanks & Regards > Vijay Veeraraghavan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >