--- Doug Cutting <[EMAIL PROTECTED]> wrote:
> [...]
> > However with a relatively high number of random insertions, the cost of the
> > "new IndexWriter / index.close()" performed for each insertion is two high.
>
> Did you measure that? How much slower was it? Did you perform any
> profiling?
I made 100 consecutive insertions and repeated 5 times for each test. Each
insertion was enclosed into a transaction.
a - the average time for 100 insertions when using the same IndexWriter for
each batch of 100 insertions is 1486 ms.
b - the average time for 100 insertions when using a new IndexWriter for each
individual insertion is 14135 ms.
Test a is around 10 times faster than b. Unfortunately b is when the
transactional behavior is not respected.
> Perhaps one could improve this by, e.g., disabling document
> index buffering, so that indexes are written directly to the final
> directory in this case, rather than first buffered in a RAMDirectory.
I don't understand here. As to guarantee that the transactions are safe I need,
until there is a better solution, to enclose each insertion into a pair new
IndexWriter.. indexWriter.close(), what could be the impact of disabling
document buffering.
Anyway, I tried setMaxBufferedDocs(1) with the configuration b. The results
didn't change significantly. I tried also setMaxBufferedDocs(0) but the
application didn't return. It seems that in this case Lucene enters into an
endless loop.
> So you've got multiple threads? Or are you proceeding in the face of
> exceptions? Otherwise I would expect that if transaction-1 fails then
> you'd avoid transaction-2, no?
In a real application it would be typically multiple threads. My 'pseudo-code'
was not very clear. It didn't mean that transaction-2 was after transaction-1
in the code, but that transaction-2 happens later in the execution of the
application.
Actually for my testing I did something like this:
for(int i = 0; i < 100; i++)
{
...
try
{
insert(myItem);
}
catch(Exception)
{
logger.error("Your item was not inserted");
}
}
..
void insert(Item item)
{
beginTransaction;
indexWriter.addDocument(item.toDocument());
insert item in database;
// Simulation of something wrong
if(Math.random() > 0.5)
{
throw Exception("Something wrong");
}
commit;
}
In a similar case if a transaction fails, it doesn't impact transactions that
occur later in the loop.
>
> Also, you'd want to add an flush() call after each addDocument(), since
> document additions are bufferred. But a flush() is just what
> IndexWriter.close() does, so then things would not be any faster than
> creating a new IndexWriter for each document.
I guess you meant IndexWriter.flushRammSegments()? I did that and as you said
it wasn't faster.
> The bottom line is that there are optimizations to be made when batching
> additions. Lucene's API is designed to encourage batching, so that
> these optimizations may be used. If you don't batch, things will be
> somewhat slower.
In this case it's 10 times slower.
So you think that creating a custom IndexWriter that would remove segmentInfos
entries (or maybe other entries) in case of transaction failure is a too
simplistic approach?
Oscar
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]