[
https://issues.apache.org/jira/browse/LUCENE-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-4339:
----------------------------------
Attachment: LUCENE-4339.patch
This is Robert's patch.
> Allow deletions on Lucene3x codecs again
> ----------------------------------------
>
> Key: LUCENE-4339
> URL: https://issues.apache.org/jira/browse/LUCENE-4339
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0-BETA
> Reporter: Uwe Schindler
> Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-4339.patch
>
>
> On dev@lao Hoss reported that a user in Solr was not able to update or delete
> documents in his 3.x index with Solr 4:
> {quote}
> On the solr-user list, Dirk Högemann recently mentioned a problem he was
> seeing when he tried upgrading his existing solr setup from 3.x to 4.0-BETA.
> Specifically this exception getting logged...
> http://find.searchhub.org/document/cdb30099bfea30c6
> auto commit error...:java.lang.UnsupportedOperationException: this codec can
> only be used for reading
> at
> org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
> at
> org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
> at
> org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
> at
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
> at
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
> at
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
> at
> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
> at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
> at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
> Dirk was able to work arround this by completely re-indexing, but it seemed
> strange to me that this would happen.
> My understanding is that even though an IndexUpgrader tool was now available,
> it wasn't going to be required for users to use it when upgrading from 3.x to
> 4.x. Explicitly upgrading the index format might be a good idea, and might
> make hte index more performant, but as I understood it, the way things had
> been implemented with codecs explicitly upgrading the index format wasn't
> strictly neccessary, and that users should be able to upgrade their lucene
> apps same way that was supported with other index format upgrades in the
> past: the old index can be read, and as changes are made new segments will be
> re-written in the new format. (Note in
> particular: at the moment we don't mention IndexUpgrader in MIGRATE.txt at
> all.)
> It appears however, based on this stack trace and some other experiements i
> tried, that any attempts to "delete" documents in a segment that is using the
> Lucene3xCodec will fail.
> This seems like a really scary time bomb sitaution, because if you upgrade,
> things will seem to be working -- you can even add documents, and depending
> on the order that you do things, some "old" segments may get merged and use
> the new format, so *some* deletes of "old" documents (in those
> merged/upgraded) segments may work, but then somewhere down the road, you may
> try to a delete that affects docs in a still un-merge/upgraded segment, and
> that delete will fail -- 5 minutes later, if another merge has happened,
> attempting to do the exact same delete may succeed.
> All of which begs the question: is this a known/intended limitation of the
> Lucene3xCodec, or an oversight in the Lucene3xCodec?
> if it's expected, then it seems like we should definitely spell out this
> limitation in MIGRATE.txt and advocate either full rebuilds, or the use of
> IndexUpgrader for anyone who's indexes are non-static.
> On the Solr side of things, i think we should even want to consider
> automaticly running IndexUpgrader on startup if we detect that the
> Lucene3xCodec is in use to simplify things -- we can't even suggest running
> "optimize" as a quick/easy way to force and index format upgrade because if
> the 3x index as already optimized then it's a no-op and the index stays in
> the 3x format.
> {quote}
> Robert said, that this is a wanted limitation (in fact its explicitely added
> to the code, without that UOE it "simply works"), but I disagree here and
> lots of other people:
> {quote}
> In the early days (I mean in the time when it was already read only until we
> refactored the IndexReader.delete()/Codec stuff), this was working, because
> the LiveDocs were always handled in a special way. Making it now 100%
> read-only is in my opinion very bad, as it does not allow to update documents
> in a 3.x index anymore, so you have no chance, you must run IndexUpgrader.
> The usual step like opening old Index and adding documents works (because the
> new documents are added always to new segment), but the much more usual
> IW.updateDocument() which is commonly used also to add documents fails on old
> Indexes. This is a no-go, we have to fix this. If we allow the trick with
> updating LiveDocs on 3.x codec, for the end-user the "read-only" stuff in
> Lucene3x codec would be completely invisible, as he can do everything
> IndexWriter provides. The other horrible things like changing norms is no
> longer possible, so deletes are the only thing that affects here. The
> read-only ness of Lucene3x codec would only be visible to the user when
> someone tries to explicitly create an index with Lucene3x codec. And I
> understood the CHANGES/MIGRATE.txt exactly as that.
> {quote}
> On the list, Robert added a simple patch, reverting the UOE in Lucene3xCodec,
> so the LiveDocs format is RW again.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]