Uwe Schindler created LUCENE-4339:
-------------------------------------

             Summary: Allow deletions on Lucene3x codecs again
                 Key: LUCENE-4339
                 URL: https://issues.apache.org/jira/browse/LUCENE-4339
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 4.0-BETA
            Reporter: Uwe Schindler
            Priority: Blocker
             Fix For: 4.0


On dev@lao Hoss reported that a user in Solr was not able to update or delete 
documents in his 3.x index with Solr 4:

{quote}
On the solr-user list, Dirk Högemann recently mentioned a problem he was seeing 
when he tried upgrading his existing solr setup from 3.x to 4.0-BETA.  
Specifically this exception getting logged...

http://find.searchhub.org/document/cdb30099bfea30c6

auto commit error...:java.lang.UnsupportedOperationException: this codec can 
only be used for reading
         at 
org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
         at 
org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
         at 
org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
         at 
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
         at 
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
         at 
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
         at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
         at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
         at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
         at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
         at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)

Dirk was able to work arround this by completely re-indexing, but it seemed 
strange to me that this would happen.

My understanding is that even though an IndexUpgrader tool was now available, 
it wasn't going to be required for users to use it when upgrading from 3.x to 
4.x.  Explicitly upgrading the index format might be a good idea, and might 
make hte index more performant, but as I understood it, the way things had been 
implemented with codecs explicitly upgrading the index format wasn't strictly 
neccessary, and that users should be able to upgrade their lucene apps same way 
that was supported with other index format upgrades in the past: the old index 
can be read, and as changes are made new segments will be re-written in the new 
format.  (Note in
particular: at the moment we don't mention IndexUpgrader in MIGRATE.txt at all.)

It appears however, based on this stack trace and some other experiements i 
tried, that any attempts to "delete" documents in a segment that is using the 
Lucene3xCodec will fail.

This seems like a really scary time bomb sitaution, because if you upgrade, 
things will seem to be working -- you can even add documents, and depending on 
the order that you do things, some "old" segments may get merged and use the 
new format, so *some* deletes of "old" documents (in those merged/upgraded) 
segments may work, but then somewhere down the road, you may try to a delete 
that affects docs in a still un-merge/upgraded segment, and that delete will 
fail -- 5 minutes later, if another merge has happened, attempting to do the 
exact same delete may succeed.

All of which begs the question: is this a known/intended limitation of the 
Lucene3xCodec, or an oversight in the Lucene3xCodec?

if it's expected, then it seems like we should definitely spell out this 
limitation in MIGRATE.txt and advocate either full rebuilds, or the use of 
IndexUpgrader for anyone who's indexes are non-static.

On the Solr side of things, i think we should even want to consider automaticly 
running IndexUpgrader on startup if we detect that the Lucene3xCodec is in use 
to simplify things -- we can't even suggest running "optimize" as a quick/easy 
way to force and index format upgrade because if the 3x index as already 
optimized then it's a no-op and the index stays in the 3x format.
{quote}

Robert said, that this is a wanted limitation (in fact its explicitely added to 
the code, without that UOE it "simply works"), but I disagree here and lots of 
other people:

{quote}
In the early days (I mean in the time when it was already read only until we 
refactored the IndexReader.delete()/Codec stuff), this was working, because the 
LiveDocs were always handled in a special way. Making it now 100% read-only is 
in my opinion very bad, as it does not allow to update documents in a 3.x index 
anymore, so you have no chance, you must run IndexUpgrader. 

The usual step like opening old Index and adding documents works (because the 
new documents are added always to new segment), but the much more usual 
IW.updateDocument() which is commonly used also to add documents fails on old 
Indexes. This is a no-go, we have to fix this. If we allow the trick with 
updating LiveDocs on 3.x codec, for the end-user the "read-only" stuff in 
Lucene3x codec would be completely invisible, as he can do everything 
IndexWriter provides. The other horrible things like changing norms is no 
longer possible, so deletes are the only thing that affects here. The read-only 
ness of Lucene3x codec would only be visible to the user when someone tries to 
explicitly create an index with Lucene3x codec. And I understood the 
CHANGES/MIGRATE.txt exactly as that.
{quote}

On the list, Robert added a simple patch, reverting the UOE in Lucene3xCodec, 
so the LiveDocs format is RW again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to