This really seems like an issue the batching mechanism (one of those errors which seem trivial on discovery :) ). I work with batched indexing and it works absolutely fine on data that is a lot higher in magnitude. You could try calling the indexwriter without the 3rd argument and see if it helps. Also, which version of lucene are you using?
-- Anshum http://ai-cafe.blogspot.com On Thu, Aug 7, 2008 at 6:09 AM, yanyanzeng <[EMAIL PROTECTED]>wrote: > > Hi, > I am building a search engine for text transcript documents from the > database of an enterprise messaging system, and have designed a batch > processing job to incrementally build the index,because the database from > production is around huge, around 10G. > Now I am still testing in DEV environment, and have been puzzled by this > problem for a couple of days. > If I build the index in one setting(because DEV database is very very > small), the index is correct because I can get hits for my queries, also, > what luke shows looks fine, 4800 documents, 450 terms. > However, if I test building using my batch processing job, I do get the > index which looks fine, but, when I search, it already returns 0 hits. I > checked with Luke, which shows there are 5200 documents, 0 terms . > There is no exception or runtime error or anything abnormal during indexing > or searching, I am really at a loss. > The only difference between the two is that: in the one setting approach, > the whole index is built using the same indexwriter object. > in the batch approach, an indexwriter object is opened per batch and > closed > when the batch is finished. > But, I think I have taken care of it by > IndexWriter writer = new IndexWriter(FSDir, Analyser, > !FSdir.exists) > > Since lucene is designed for adding to exisiting index when the 3rd > parameter is false, I do not understand where it went wrong. > Should I have kept one singleton instance of the writer until all > documents in the database are processed, rather than opening &closing one > for each batch? Or, should I have kept a single instance of analyser? > This does not seem necessary, but I really can not figure out where it went > wrong, and how come this strange behavior: 520 documents but 0 terms. > > I would be very grateful if anyone could advise. THanks very much. > > yanyan > > > -- > View this message in context: > http://www.nabble.com/bad-index-by-batch-indexing-tp18862037p18862037.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- -- The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw............