[
https://issues.apache.org/jira/browse/LUCENE-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444344#comment-13444344
]
Hoss Man commented on LUCENE-4339:
----------------------------------
bq. And I had not realized our official back-compat policy all this time was
only ensuring reading (not writing/updating) old indices (thanks Robert)
I think there was "bug" in the wording of that wiki page, because the
understanding and discussion in all past releases (including in the described
upgrade steps) was that lucene would "read" your existing index, and
automatically convert as you made updates - a delete is a type of update.
bq. +1 to separate what we do for 4.0 vs a change to our back-compat policy. It
seems like (I have to think about the patch) doing this for 4.0 is easy ... but
that may not necessarily hold true for future releases.
Agreed -- if we want to say that 5.0 can "read" 4.0 but can't delete docs
that's fine, but let's worry about that when a situation where it actually
makes a difference in performance/simplicity/maintainability comes up. Hell,
if it really makes a big difference, i'm fine with saying that you *must* run
an upgrade tool for 5.0 to even read 4.0, but we should't make those decisions
in the abstract, they should be based on actual implications.
bq. the fact that no users have complained (until now) about 4.0 disallowing
deletions on a 3.x index is telling.
Personally I think people are reading too much into this - i suspect most of
the people who have been using the alpha and beta so far are the more
"adventurous" devs, who are more likely to "rebuild the world" on upgrade
anyway. More "cautious" and "conservative" devs who will want to upgrade in
place are probably not that interested in looking at 4.0 until 4.0-final.
Bottom line: If we have a simple patch that will allow 4.0 to not only "read"
3.x indexes, but also easily "update" those indexes via delets/merges, then i
say we commit it.
If enough folks feel strongly that this patch shouldn't be committed, and that
IndexUpgrader should be used by any user who might want to "delete" docs from a
3x index, then I would argue that we not only need to more heavily document
this, but we should also find some way to make the 4.0 IndexWriter "fail fast"
when you point it at an index that contains segments using the 3x codec - we
should not allow this time bomb situation where some doc updates/deletes might
work because the segments have already been merged/upgraded to the 4x format,
but other updates/deletes fail because the affected documents are still in a 3x
segment.
> Allow deletions on Lucene3x codecs again
> ----------------------------------------
>
> Key: LUCENE-4339
> URL: https://issues.apache.org/jira/browse/LUCENE-4339
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0-BETA
> Reporter: Uwe Schindler
> Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-4339.patch
>
>
> On dev@lao Hoss reported that a user in Solr was not able to update or delete
> documents in his 3.x index with Solr 4:
> {quote}
> On the solr-user list, Dirk Högemann recently mentioned a problem he was
> seeing when he tried upgrading his existing solr setup from 3.x to 4.0-BETA.
> Specifically this exception getting logged...
> http://find.searchhub.org/document/cdb30099bfea30c6
> auto commit error...:java.lang.UnsupportedOperationException: this codec can
> only be used for reading
> at
> org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
> at
> org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
> at
> org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
> at
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
> at
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
> at
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
> at
> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
> at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
> at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
> Dirk was able to work arround this by completely re-indexing, but it seemed
> strange to me that this would happen.
> My understanding is that even though an IndexUpgrader tool was now available,
> it wasn't going to be required for users to use it when upgrading from 3.x to
> 4.x. Explicitly upgrading the index format might be a good idea, and might
> make hte index more performant, but as I understood it, the way things had
> been implemented with codecs explicitly upgrading the index format wasn't
> strictly neccessary, and that users should be able to upgrade their lucene
> apps same way that was supported with other index format upgrades in the
> past: the old index can be read, and as changes are made new segments will be
> re-written in the new format. (Note in
> particular: at the moment we don't mention IndexUpgrader in MIGRATE.txt at
> all.)
> It appears however, based on this stack trace and some other experiements i
> tried, that any attempts to "delete" documents in a segment that is using the
> Lucene3xCodec will fail.
> This seems like a really scary time bomb sitaution, because if you upgrade,
> things will seem to be working -- you can even add documents, and depending
> on the order that you do things, some "old" segments may get merged and use
> the new format, so *some* deletes of "old" documents (in those
> merged/upgraded) segments may work, but then somewhere down the road, you may
> try to a delete that affects docs in a still un-merge/upgraded segment, and
> that delete will fail -- 5 minutes later, if another merge has happened,
> attempting to do the exact same delete may succeed.
> All of which begs the question: is this a known/intended limitation of the
> Lucene3xCodec, or an oversight in the Lucene3xCodec?
> if it's expected, then it seems like we should definitely spell out this
> limitation in MIGRATE.txt and advocate either full rebuilds, or the use of
> IndexUpgrader for anyone who's indexes are non-static.
> On the Solr side of things, i think we should even want to consider
> automaticly running IndexUpgrader on startup if we detect that the
> Lucene3xCodec is in use to simplify things -- we can't even suggest running
> "optimize" as a quick/easy way to force and index format upgrade because if
> the 3x index as already optimized then it's a no-op and the index stays in
> the 3x format.
> {quote}
> Robert said, that this is a wanted limitation (in fact its explicitely added
> to the code, without that UOE it "simply works"), but I disagree here and
> lots of other people:
> {quote}
> In the early days (I mean in the time when it was already read only until we
> refactored the IndexReader.delete()/Codec stuff), this was working, because
> the LiveDocs were always handled in a special way. Making it now 100%
> read-only is in my opinion very bad, as it does not allow to update documents
> in a 3.x index anymore, so you have no chance, you must run IndexUpgrader.
> The usual step like opening old Index and adding documents works (because the
> new documents are added always to new segment), but the much more usual
> IW.updateDocument() which is commonly used also to add documents fails on old
> Indexes. This is a no-go, we have to fix this. If we allow the trick with
> updating LiveDocs on 3.x codec, for the end-user the "read-only" stuff in
> Lucene3x codec would be completely invisible, as he can do everything
> IndexWriter provides. The other horrible things like changing norms is no
> longer possible, so deletes are the only thing that affects here. The
> read-only ness of Lucene3x codec would only be visible to the user when
> someone tries to explicitly create an index with Lucene3x codec. And I
> understood the CHANGES/MIGRATE.txt exactly as that.
> {quote}
> On the list, Robert added a simple patch, reverting the UOE in Lucene3xCodec,
> so the LiveDocs format is RW again.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]