6 okt 2009 kl. 18.54 skrev David Causse:
David, your timing couldn't be better. Just the other day I proposed
that we deprecate InstantiatedIndexWriter. The sum of the reasons to
this is that I'm a bit lazy. Your mail makes me reconsider.
https://issues.apache.org/jira/browse/LUCENE-1948
On the index time InstantiatedIndex is behind RAMDirectory, but the
time
Would you mind benchmarking some for me using your corpora? The issue
suggests that people use the InstantiatedIndex(IndexReader)
constructor to create the index rather than using
InstantiatedIndexWriter. Is it way slower for you to produce the index
using RAMDirectory/IndexWriter and pass an IndexReader to
InstantiatedIndex?
This is what the package level javadocs says about
InstantiatedIndexWriter:
"Hardly any effort has been put in to optimizing the
InstantiatedIndexWriter, only minimizing the amount of time needed to
write-lock the index has been considered."
I'm sure there are ways to speed it up, I just never managed to find
the time to look in to it. I never really used IIW.
It might be worth mentioning that when InstantiatedIndex#commit
returns it has yeilded an optimized "single segment" index. This is
not quite how a Directory/IndexWriter acts.
gained over queries make it better (for what I see it can be 2 times
faster).
InstantiatedIndex will be our default volatile mini index store for
our
next production release.
Very cool!!
Whe should have other needs of this index but the lack of addIndexes
support make it impossible for us to use it in other situations. So we
continue to use RAMDirectory in such situations.
Have you considered using multiple InstantiatedIndex and a
MultiReader? That would pretty much be the same thing, just that the
store wouldn't be quite as optimized. It would definitly use more RAM
than if it was the same index. You could of course also pass this
MultiReader to a new InstantiatedIndex. I have no real clue about the
difference in speed and RAM consuption between these solutions so you
should benchmark all solutions.
Do you think we could reach RAMDirectory index time by tweaking some
initialCap
stuff inside java.util.Collections you use?
Maybe. But I think it would be a relatively small gain. But don't take
my words for granted, benchmark it.
Using the InstantiatedIndex(IndexReader) constructor will create
rather optimal size of the collections.
As for InstantiatedIndexWriter I think it's pretty much only the
transient collections in #commit that will help you, my guess is that
you should expemient with the dirtyTerms and termsByText attributes.
Count the number of terms in your complete index and see how much it
speeds thing up by creating the collections with this size from the
start.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]