Re: Transactional Directories

Oscar Picasso Wed, 16 Feb 2005 06:54:33 -0800

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> [...]
> > However with a relatively high number of random insertions, the cost of the
> > "new IndexWriter / index.close()" performed for each insertion is two high.
> 
> Did you measure that?  How much slower was it?  Did you perform any 
> profiling?


I made 100 consecutive insertions and repeated 5 times for each test. Each
insertion was enclosed into a transaction.
a - the average time for 100 insertions when using the same IndexWriter for
each batch of 100 insertions is 1486 ms.
b - the average time for 100 insertions when using a new IndexWriter for each
individual insertion is 14135 ms.
Test a is around 10 times faster than b. Unfortunately b is when the
transactional behavior is not respected.

> Perhaps one could improve this by, e.g., disabling document 
> index buffering, so that indexes are written directly to the final 
> directory in this case, rather than first buffered in a RAMDirectory.
I don't understand here. As to guarantee that the transactions are safe I need,
until there is a better solution, to enclose each insertion into a pair new
IndexWriter.. indexWriter.close(), what could be the impact of disabling
document buffering.

Anyway, I tried setMaxBufferedDocs(1) with the configuration b. The results
didn't change significantly. I tried also setMaxBufferedDocs(0) but the
application didn't return. It seems that in this case Lucene enters into an
endless loop. 

> So you've got multiple threads?  Or are you proceeding in the face of 
> exceptions?  Otherwise I would expect that if transaction-1 fails then 
> you'd avoid transaction-2, no?

In a real application it would be typically multiple threads. My 'pseudo-code'
was not very clear. It didn't mean that transaction-2 was after transaction-1
in the code, but that transaction-2 happens later in the execution of the
application.

Actually for my testing I did something like this:

for(int i = 0; i < 100; i++)
{
  ...
  try
  {
     insert(myItem);
  }
  catch(Exception)
  {
    logger.error("Your item was not inserted");
  }
}
..
void insert(Item item)
{
  beginTransaction;
  indexWriter.addDocument(item.toDocument());
  insert item in database;
  // Simulation of something wrong
  if(Math.random() > 0.5)
  {
    throw Exception("Something wrong");
  }
  commit;
}

In a similar case if a transaction fails, it doesn't impact transactions that
occur later in the loop.

> 
> Also, you'd want to add an flush() call after each addDocument(), since 
> document additions are bufferred.  But a flush() is just what 
> IndexWriter.close() does, so then things would not be any faster than 
> creating a new IndexWriter for each document.

I guess you meant IndexWriter.flushRammSegments()? I did that and as you said
it wasn't faster.

> The bottom line is that there are optimizations to be made when batching 
> additions.  Lucene's API is designed to encourage batching, so that 
> these optimizations may be used.  If you don't batch, things will be 
> somewhat slower.

In this case it's 10 times slower.

So you think that creating a custom IndexWriter that would remove segmentInfos
entries (or maybe other entries) in case of transaction failure is a too
simplistic approach?

Oscar

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Transactional Directories

Reply via email to