If you sort your old index file by filename, then iterate over the sorted file, your problem is solved, no?
From: Lokendra Singh [mailto:lsingh....@gmail.com] Sent: Friday, February 25, 2011 1:27 AM To: java-u...@lucene.apache.org Cc: dev@lucene.apache.org Subject: Converting an existing index format to Lucene Index Hi all, I am seeking for some guidelines to directly convert an already existing index to Lucene index. The index available to me is of a set of <value1,value2> pairs. Where each pair is : < word , fileName > i.e a word as a 'value1', and the 'value2' being the fileName containing that word. A word might appear in several fileNames as well a same file can contain multiple copies of a word. For eg, following index is possible: < "my" , "file1" > < "you" , "file2" > < "my", "file2" > < "my", "file1"> My actual problem is that the index available to me is very large in size, hence I am bit reluctant to create 'Document' object for each file because for that I will have to read through all the pairs first and store them in memory. Or I will have to 'update' the 'Document' object of a particular file while iterating through the Pairs of my index, this 'update', again, is a costly operation. Please correct me if my understanding of Lucene is wrong or other alternative ways. Regards Lokendra