[
https://issues.apache.org/jira/browse/SOLR-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258838#comment-13258838
]
Per Steffensen commented on SOLR-3178:
--------------------------------------
bq. {quote} it is hard to push too many small partly-done features. {quote}
bq. As committers it's the opposite - it's sometimes hard to consume patches
that do a lot of different things.
I could imagine. Im sorry that it is possibly going to be a little bit like
that this time. But I believe I will provide great stuff.
bq. {quote} It works with multi-document update request, providing fully
detailed feedback to the client about which document-updates failed (and why)
and which succeeded. {quote}
bq. That's great! But it's also a separate feature that we've needed for a
while (and I think there has been a JIRA issue open for it for a while).
Yes I agree. See SOLR-3382. The solution I will provide is generic in the way
that is supports giving feedback to the client about "parts" of a request that
failed. The client will know that the "parts" which are not reported to have
failed did succeed. Usefull for any kind of request where parts of it can fail
and other parts can succeed. The client provide a "partRef" for all parts, and
partial errors from the server each contain a "partRef" in order to let the
client know which part(s) failed. I only use it for multi-document (or
batch-updates as I see you call it in come places) updates (here each document
is a "part"), but I guess it could also be used for other things like
- Reporting which operations of a multi-operation (e.g. one containing both
adds, deletes, commits, etc) update that failed.
- Reporting if a single update partly failed (e.g. if using replication and the
update was successfull on leader but not on one of more of the replicas)
- Etc.
bq. {quote} Looking shortly at you patch, I belive, that if two threads in
the server JVM overlaps in a way that is unfortunate enough, then it is
possible that data will not be stored or will be overwritten without an
exception being thrown to indicate that to the client. {quote}
bq. I'm confident in the synchronization/concurrency part - it's just reusing
what was put in place for SolrCloud to handle reordering of updates to replicas
and is very well tested (see TestRealTimeGet). But please let us know if you
see a problem with it, as that would mean a problem with SolrCloud today (even
without this patch).
Looking closer at your code I would like to withdraw my concern about your
solution. I didnt see that you are doing both "get existing version and check
against provided version" and "store updated version" inside the same block
governed by two sync-locks
- A readlock on a ReadWriteLock which is writelocked during deleteByQueries
(your "vinfo.lockForUpdate"), protecting against concurent deletes (by query)
- A lock on the uniqueKey field (or a hash of it in order to be efficient) of
the document (your "synchronized (bucket)"), protecting against concurrent
updates and deletes (by id)
So basically I have repeated that locking-pattern in DirectUpdateHandler2 where
I chose to place the "unique key constaint"- and "versioning"- check. Guess my
locking can be removed if it is really necessary to have this kind of locking
"this far up in the callstack" (in DistributedUpdateProcessor.versionAdd). For
the purpose of "unique key constraint"- and "versioning"-check I believe it is
only necessary to have this kind of locking around the actual updates in
DirectUpdateHandler2 (when leader and not peer_sync) - and the smaller you can
make synchronized blocks the better. But maybe you have reasons (related to
SolrCloud) where you need the locking "this far up in the callstack".
But besides that fact that the locking I did in DirectUpdateHandler2 can be
removed I really believe that the actual checking of "unique key constraint"-
and "version"-violation logically belong in DirectUpdateHandler2, so I guess I
will still try to convince you to take that part of my patch in.
Besides that, I believe your solution just implements what I call
semantics-mode "classic-consistency-hybrid" on
http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics. I believe the
"classic" mode and "consistency" mode are very nice to have in order to provide
complete backward compabitility (classic mode) and a very strict consistency
mode where it is not possible to "break the checks" by providing \_version\_ =
0 (consistency mode).
Finally I believe my solution is a little better with respect to "design" and
"separation of concerns", by isolatating the rules and rule-checks in a
separate class instead of just coding it directly inside
DistributedUpdateProcessor which already deals with a lot of other concerns,
but that is another discussion, and should not count as the deciding argument.
bq. {quote} Feedback to client is even possible if using
ConcurrentSolrServer, because it has been changed to provide a Future from
"request" - a Future that will eventually provide the calling client with the
result from the request.{quote}
bq. This sounds like another great feature! It really deserves it's own issue
so people can more easily provide review/feedback and not have it coupled to
this issue.
Created the isolated issue - see SOLR-3383
{quote} Some other additions that can be handled later that I see:
SolrJ support for easier passing of version on a delete, constants for
version, etc {quote}
Well basically I have moved VERSION_FIELD = "\_version\_" from VersionInfo to
SolrInputDocument.
bq. Use of "1" as a generic "exists" version (i.e. update document only if
it already exists)
Well I started out by wanting that feature (update only if exists, but no exact
version check), but you couldnt see the good reason for that. Since then I
realized that you are probably right and have removed that exact possibility -
of course in classic and classic-consistency-hybrid modes you can still do
updates without providing the exact version, but there is error if the document
does not exist.
bq. If one document in a batch fails, don't automatically abort, and provide
info back about which docs succeeded and which failed (your first point).
That is part of my solution - basically the thing making it necessary to add
the ability to send many errors back to the client - SOLR-3382
bq. That last one in particular needs some good discussion and design work and
really deserves it's own issue.
Agree. Go discuss and comment on SOLR-3382
{quote} Having this current improvement committed in no way blocks any future
improvements you may come up with (including deleting the code quoted above and
using whatever method you have come up with for checking the versions), and it
even uses the same API (or a subset of it) via version and 409. Progress, not
perfection! {quote}
Agree, so I guess you could go commit, but that would just make my merge-task a
little bigger, so I hope you will hold your horses a few days more.
bq. Are there any technical objections to this patch?
Except for the things I mention here and there in this response, no. And those
are all things that could be fixed later, going nicely hand in hand with your
great philosophy of "progress, not perfection".
bq. It really piggybacks on all of the SolrCloud work done to pass around and
check versions, so it's really small.
Yes it is small and beautiful
{quote} The API is as follows:
a) If you provide a version > 0 with an add or a delete, the update will
fail with a 409 unless the provided version matches exactly the latest in the
index for the ID.
b) If you provide a version < 0 with an add or delete, the update will fail
with a 409 if the ID exists in the index.
{quote}
You always provide the same error (except the versions in the message). You do
not distinguish between DocumentAlreadyExist, DocumentDoesNotExist and
VersionConflict errors as I do. Of course the client would know that it is a
DocumentAlreadyExists if he sent a negative version number and got 409 (at
least as long as 409 is not used for other kind of conflicts that can also
happen when version < 0) and that it is DocumentDoesNotExist or VersionConflict
if he sent a positive version number and got 409 (at least ...). But he really
doesnt know if it is DocumentDoesNotExist or VersionConflict unless he parses
the error message, and that might be important to know because he might want to
react differently - e.g.
- On DocumentDoesNotExist: Either do nothing or go do a create (version < 0) of
the document
- On VersionConflict: Go refetch the document from Solr, merge in his changes
(automatically or by the help from a user) and try update again.
Of course you also need to parse error-type in my solution to know exactly what
the problem is, but you need no "rules" on client side (e.g. knowing that if
you sent version > 0 and you got version-already-in-store < 0 back, it is
DocumentDoesNotExist), and in SolrJ my solution automatically parses error-type
and throw corresponding exceptions.
Ad b) Maybe you dont want an error if you try to delete a document which has
already been deleted, but probably ok
bq. That's the whole scope of this patch, and I believe is compatible with the
larger scope of what Per will provide in the future.
Agree!
> Versioning - optimistic locking
> -------------------------------
>
> Key: SOLR-3178
> URL: https://issues.apache.org/jira/browse/SOLR-3178
> Project: Solr
> Issue Type: New Feature
> Components: update
> Affects Versions: 3.5
> Environment: All
> Reporter: Per Steffensen
> Assignee: Per Steffensen
> Labels: RDBMS, insert, locking, nosql, optimistic, uniqueKey,
> update, versioning
> Fix For: 4.0
>
> Attachments: SOLR-3178.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> In order increase the ability of Solr to be used as a NoSql database (lots of
> concurrent inserts, updates, deletes and queries in the entire lifetime of
> the index) instead of just a search index (first: everything indexed (in one
> thread), after: only queries), I would like Solr to support versioning to be
> used for optimistic locking.
> When my intent (see SOLR-3173) is to update an existing document, I will need
> to provide a version-number equal to "the version number I got when I fetched
> the existing document for update" plus one. If this provided version-number
> does not correspond to "the newest version-number of that document at the
> time of update" plus one, I will get a VersionConflict error. If it does
> correspond the document will be updated with the new one, so that "the newest
> version-number of that document" is NOW one higher than before the update.
> Correct but efficient concurrency handling.
> When my intent (see SOLR-3173) is to insert a new document, the version
> number provided will not be used - instead a version-number 0 will be used.
> According to SOLR-3173 insert will only succeed if a document with the same
> value on uniqueKey-field does not already exist.
> In general when talking about different versions of "the same document", of
> course we need to be able to identify when a document "is the same" - that,
> per definition, is when the values of the uniqueKey-fields are equal.
> The functionality provided by this issue is only really meaningfull when you
> run with "updateLog" activated.
> This issue might be solved more or less at the same time as SOLR-3173, and
> only one single SVN patch might be given to cover both issues.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]