On the solr-user list, Dirk Högemann recently mentioned a problem he was
seeing when he tried upgrading his existing solr setup from 3.x to
4.0-BETA. Specifically this exception getting logged...
http://find.searchhub.org/document/cdb30099bfea30c6
auto commit error...:java.lang.UnsupportedOperationException: this codec can
only be used for reading
at
org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
at
org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
at
org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
Dirk was able to work arround this by completely re-indexing, but it
seemed strange to me that this would happen.
My understanding is that even though an IndexUpgrader tool was now
available, it wasn't going to be required for users to use it when
upgrading from 3.x to 4.x. Explicitly upgrading the index format might be
a good idea, and might make hte index more performant, but as I understood
it, the way things had been implemented with codecs explicitly upgrading
the index format wasn't strictly neccessary, and that users should be able
to upgrade their lucene apps same way that was supported with other index
format upgrades in the past: the old index can be read, and as changes are
made new segments will be re-written in the new format. (Note in
particular: at the moment we don't mention IndexUpgrader in
MIGRATE.txt at all.)
It appears however, based on this stack trace and some other experiements
i tried, that any attempts to "delete" documents in a segment that is
using the Lucene3xCodec will fail.
This seems like a really scary time bomb sitaution, because if you
upgrade, things will seem to be working -- you can even add documents, and
depending on the order that you do things, some "old" segments may get
merged and use the new format, so *some* deletes of "old" documents (in
those merged/upgraded) segments may work, but then somewhere down the
road, you may try to a delete that affects docs in a still un-merge/upgraded
segment, and that delete will fail -- 5 minutes later, if another merge
has happened, attempting to do the exact same delete may succeed.
All of which begs the question: is this a known/intended limitation of the
Lucene3xCodec, or an oversight in the Lucene3xCodec?
if it's expected, then it seems like we should definitely spell out this
limitation in MIGRATE.txt and advocate either full rebuilds, or the use of
IndexUpgrader for anyone who's indexes are non-static.
On the Solr side of things, i think we should even want to consider
automaticly running IndexUpgrader on startup if we detect that the
Lucene3xCodec is in use to simplify things -- we can't even suggest
running "optimize" as a quick/easy way to force and index format upgrade
because if the 3x index as already optimized then it's a no-op and the
index stays in the 3x format.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]