hi, thank you very much for your reply. I am using the latest version, lucene 2.3.2. I will try using two arguments and post my result later.
yanyan Anshum-2 wrote: > > This really seems like an issue the batching mechanism (one of those > errors > which seem trivial on discovery :) ). I work with batched indexing and it > works absolutely fine on data that is a lot higher in magnitude. You could > try calling the indexwriter without the 3rd argument and see if it helps. > Also, which version of lucene are you using? > > -- > Anshum > http://ai-cafe.blogspot.com > > On Thu, Aug 7, 2008 at 6:09 AM, yanyanzeng > <[EMAIL PROTECTED]>wrote: > >> >> Hi, >> I am building a search engine for text transcript documents from the >> database of an enterprise messaging system, and have designed a batch >> processing job to incrementally build the index,because the database from >> production is around huge, around 10G. >> Now I am still testing in DEV environment, and have been puzzled by >> this >> problem for a couple of days. >> If I build the index in one setting(because DEV database is very very >> small), the index is correct because I can get hits for my queries, >> also, >> what luke shows looks fine, 4800 documents, 450 terms. >> However, if I test building using my batch processing job, I do get the >> index which looks fine, but, when I search, it already returns 0 hits. I >> checked with Luke, which shows there are 5200 documents, 0 terms . >> There is no exception or runtime error or anything abnormal during >> indexing >> or searching, I am really at a loss. >> The only difference between the two is that: in the one setting >> approach, >> the whole index is built using the same indexwriter object. >> in the batch approach, an indexwriter object is opened per batch and >> closed >> when the batch is finished. >> But, I think I have taken care of it by >> IndexWriter writer = new IndexWriter(FSDir, Analyser, >> !FSdir.exists) >> >> Since lucene is designed for adding to exisiting index when the 3rd >> parameter is false, I do not understand where it went wrong. >> Should I have kept one singleton instance of the writer until all >> documents in the database are processed, rather than opening &closing one >> for each batch? Or, should I have kept a single instance of >> analyser? >> This does not seem necessary, but I really can not figure out where it >> went >> wrong, and how come this strange behavior: 520 documents but 0 terms. >> >> I would be very grateful if anyone could advise. THanks very much. >> >> yanyan >> >> >> -- >> View this message in context: >> http://www.nabble.com/bad-index-by-batch-indexing-tp18862037p18862037.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > -- > -- > The facts expressed here belong to everybody, the opinions to me. > The distinction is yours to draw............ > > -- View this message in context: http://www.nabble.com/bad-index-by-batch-indexing-tp18862037p18863533.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]