--- Doug Cutting <[EMAIL PROTECTED]> wrote: > [...] > > However with a relatively high number of random insertions, the cost of the > > "new IndexWriter / index.close()" performed for each insertion is two high. > > Did you measure that? How much slower was it? Did you perform any > profiling?
I made 100 consecutive insertions and repeated 5 times for each test. Each insertion was enclosed into a transaction. a - the average time for 100 insertions when using the same IndexWriter for each batch of 100 insertions is 1486 ms. b - the average time for 100 insertions when using a new IndexWriter for each individual insertion is 14135 ms. Test a is around 10 times faster than b. Unfortunately b is when the transactional behavior is not respected. > Perhaps one could improve this by, e.g., disabling document > index buffering, so that indexes are written directly to the final > directory in this case, rather than first buffered in a RAMDirectory. I don't understand here. As to guarantee that the transactions are safe I need, until there is a better solution, to enclose each insertion into a pair new IndexWriter.. indexWriter.close(), what could be the impact of disabling document buffering. Anyway, I tried setMaxBufferedDocs(1) with the configuration b. The results didn't change significantly. I tried also setMaxBufferedDocs(0) but the application didn't return. It seems that in this case Lucene enters into an endless loop. > So you've got multiple threads? Or are you proceeding in the face of > exceptions? Otherwise I would expect that if transaction-1 fails then > you'd avoid transaction-2, no? In a real application it would be typically multiple threads. My 'pseudo-code' was not very clear. It didn't mean that transaction-2 was after transaction-1 in the code, but that transaction-2 happens later in the execution of the application. Actually for my testing I did something like this: for(int i = 0; i < 100; i++) { ... try { insert(myItem); } catch(Exception) { logger.error("Your item was not inserted"); } } .. void insert(Item item) { beginTransaction; indexWriter.addDocument(item.toDocument()); insert item in database; // Simulation of something wrong if(Math.random() > 0.5) { throw Exception("Something wrong"); } commit; } In a similar case if a transaction fails, it doesn't impact transactions that occur later in the loop. > > Also, you'd want to add an flush() call after each addDocument(), since > document additions are bufferred. But a flush() is just what > IndexWriter.close() does, so then things would not be any faster than > creating a new IndexWriter for each document. I guess you meant IndexWriter.flushRammSegments()? I did that and as you said it wasn't faster. > The bottom line is that there are optimizations to be made when batching > additions. Lucene's API is designed to encourage batching, so that > these optimizations may be used. If you don't batch, things will be > somewhat slower. In this case it's 10 times slower. So you think that creating a custom IndexWriter that would remove segmentInfos entries (or maybe other entries) in case of transaction failure is a too simplistic approach? Oscar __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]