On Thu, 2008-03-06 at 18:40 +0100, [EMAIL PROTECTED] wrote:
> > >  With a commit after every add: 30 min.
> > >  With a commit after 100 add: 23 min.
> > >  Only one commit: 20 min.

[...]

> I think it is a real world scenario because one has always the read the docs
> from somewhere and offen has to store the index state somewhere else.

Very true, but the time it takes to create the documents varies greatly
between systems.

I tried repeating your test by creating a simple 14 MB index with 10,000
documents on my desktop-machine. each document was made up of

 - one non-tokenized unique stored indexed field
 - one non-tokenized indexed stored field with one of 9 terms
 - one tokenized field with 930 random characters, including space

With a commit after every add: 4 min, 46 sec.
With a commit after every 100 add: 12 sec.
Only one commit: 8 sec.


Guesstimating the amortized time spend on adding each document on such a
small corpus, by blatantly ignoring the overhead of creating the
documents, gives us the following:

With a commit after every add: (286 sec / 10,000 docs) 28.6 ms.
With a commit after every 100 add: (12 sec / 10,000 docs) 1.2 ms.
Only one commit: (8 sec / 10,000 docs) 0.8 ms.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to