Lucene doesn't know where a file start or ends, actually it knows, but in your case 1 Docuemtn contains more small documents.If you want to split your big file in small files you must to that by yourself, Take a look at the Document class and you will see that Lucene use a Reader to index the body of a file, so may be you should build a class that return a Reader for each sub-document you want. But i think is easier split your main document in small document, index this small documents with a common "keyword" that is the actual Big file name, so when you'll search you can understand where this "sub" document is allocated. After you index those files you can delete them. What you need is a BigDocumentManager that:
1.split your big file/s 2.index them. (don't forget the keyword => big doc name) 3.delete those "sub" documents (are like temp docs). Hope this helps. -- On Wed, 12 Jun 2002 02:26:58 Chris Sibert wrote: >I have a big ( 40 MB or so) file to index. The file contains a whole bunch >of documents, which are each pretty small, about a few typewritten pages >long. There's a title, date, and author for each document, in addition to >the documents' actual text. > >I'm not quite sure how you index this in Lucene. For each document in the >original file, I assume that I create a separate Lucene Document object in >the index with author, date, title, and text fields. If so, my question is >that when I'm reading in the original file for indexing, does Lucene know >where each document begins and ends in the original file ? Or do I have to >write a parser or filter or something for the InputStream that's reading the >file ? > >Chris Sibert > > > >-- >To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> >For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > _______________________________________________________ WIN a first class trip to Hawaii. Live like the King of Rock and Roll on the big Island. Enter Now! http://r.lycos.com/r/sagel_mail/http://www.elvis.lycos.com/sweepstakes -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
