[ 
https://issues.apache.org/jira/browse/LUCENE-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4339:
----------------------------------

    Attachment: LUCENE-4339.patch

This is Robert's patch.
                
> Allow deletions on Lucene3x codecs again
> ----------------------------------------
>
>                 Key: LUCENE-4339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4339
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0-BETA
>            Reporter: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-4339.patch
>
>
> On dev@lao Hoss reported that a user in Solr was not able to update or delete 
> documents in his 3.x index with Solr 4:
> {quote}
> On the solr-user list, Dirk Högemann recently mentioned a problem he was 
> seeing when he tried upgrading his existing solr setup from 3.x to 4.0-BETA.  
> Specifically this exception getting logged...
> http://find.searchhub.org/document/cdb30099bfea30c6
> auto commit error...:java.lang.UnsupportedOperationException: this codec can 
> only be used for reading
>          at 
> org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
>          at 
> org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
>          at 
> org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
>          at 
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
>          at 
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
>          at 
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
>          at 
> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
>          at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
>          at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
>          at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
>          at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
> Dirk was able to work arround this by completely re-indexing, but it seemed 
> strange to me that this would happen.
> My understanding is that even though an IndexUpgrader tool was now available, 
> it wasn't going to be required for users to use it when upgrading from 3.x to 
> 4.x.  Explicitly upgrading the index format might be a good idea, and might 
> make hte index more performant, but as I understood it, the way things had 
> been implemented with codecs explicitly upgrading the index format wasn't 
> strictly neccessary, and that users should be able to upgrade their lucene 
> apps same way that was supported with other index format upgrades in the 
> past: the old index can be read, and as changes are made new segments will be 
> re-written in the new format.  (Note in
> particular: at the moment we don't mention IndexUpgrader in MIGRATE.txt at 
> all.)
> It appears however, based on this stack trace and some other experiements i 
> tried, that any attempts to "delete" documents in a segment that is using the 
> Lucene3xCodec will fail.
> This seems like a really scary time bomb sitaution, because if you upgrade, 
> things will seem to be working -- you can even add documents, and depending 
> on the order that you do things, some "old" segments may get merged and use 
> the new format, so *some* deletes of "old" documents (in those 
> merged/upgraded) segments may work, but then somewhere down the road, you may 
> try to a delete that affects docs in a still un-merge/upgraded segment, and 
> that delete will fail -- 5 minutes later, if another merge has happened, 
> attempting to do the exact same delete may succeed.
> All of which begs the question: is this a known/intended limitation of the 
> Lucene3xCodec, or an oversight in the Lucene3xCodec?
> if it's expected, then it seems like we should definitely spell out this 
> limitation in MIGRATE.txt and advocate either full rebuilds, or the use of 
> IndexUpgrader for anyone who's indexes are non-static.
> On the Solr side of things, i think we should even want to consider 
> automaticly running IndexUpgrader on startup if we detect that the 
> Lucene3xCodec is in use to simplify things -- we can't even suggest running 
> "optimize" as a quick/easy way to force and index format upgrade because if 
> the 3x index as already optimized then it's a no-op and the index stays in 
> the 3x format.
> {quote}
> Robert said, that this is a wanted limitation (in fact its explicitely added 
> to the code, without that UOE it "simply works"), but I disagree here and 
> lots of other people:
> {quote}
> In the early days (I mean in the time when it was already read only until we 
> refactored the IndexReader.delete()/Codec stuff), this was working, because 
> the LiveDocs were always handled in a special way. Making it now 100% 
> read-only is in my opinion very bad, as it does not allow to update documents 
> in a 3.x index anymore, so you have no chance, you must run IndexUpgrader. 
> The usual step like opening old Index and adding documents works (because the 
> new documents are added always to new segment), but the much more usual 
> IW.updateDocument() which is commonly used also to add documents fails on old 
> Indexes. This is a no-go, we have to fix this. If we allow the trick with 
> updating LiveDocs on 3.x codec, for the end-user the "read-only" stuff in 
> Lucene3x codec would be completely invisible, as he can do everything 
> IndexWriter provides. The other horrible things like changing norms is no 
> longer possible, so deletes are the only thing that affects here. The 
> read-only ness of Lucene3x codec would only be visible to the user when 
> someone tries to explicitly create an index with Lucene3x codec. And I 
> understood the CHANGES/MIGRATE.txt exactly as that.
> {quote}
> On the list, Robert added a simple patch, reverting the UOE in Lucene3xCodec, 
> so the LiveDocs format is RW again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to