Indexing only newly created files

Vijay Veeraraghavan Mon, 03 May 2010 00:22:04 -0700

Dear all,
I am using lucene 3.0 to index the pdf reports that I generate
dynamically. I index the pdf file name (without extension), file path
and its absolute path as fields. I search with the file name without
extension; it retrieves a list, as usually 2 or more files are present
in the same name in different sub directories. As I create the index
for the first time it updates, assuming 100 pdf files in different
directories, the files meta info. If again I do indexing, while my
report generator scheduler has the produced 500 more pdf files
totaling to 600 files in different directories, I wish to index only
the new files to the index. But presently it’s doing the whole thing
again (600 files). How to implement this functionality? Think of the
thousands of pdf files created on each run.


P.S: I cannot keep the meta-info of generated pdf files in the java
memory, as it exceeds thousands in a single run, and update the index
looping this list.

new IndexWriter(FSDirectory.open(this.indexDir), new StandardAnalyzer(
                                        Version.LUCENE_CURRENT), true,
                                        IndexWriter.MaxFieldLength.LIMITED);

is the boolean parameter is for this purpose? Please guide me.

-- 
Thanks
Vijay Veeraraghavan



-- 
Thanks & Regards
Vijay Veeraraghavan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Indexing only newly created files

Reply via email to