[ 
https://issues.apache.org/jira/browse/SOLR-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258838#comment-13258838
 ] 

Per Steffensen commented on SOLR-3178:
--------------------------------------

bq. {quote}    it is hard to push too many small partly-done features. {quote}

bq. As committers it's the opposite - it's sometimes hard to consume patches 
that do a lot of different things.

I could imagine. Im sorry that it is possibly going to be a little bit like 
that this time. But I believe I will provide great stuff.

bq. {quote}    It works with multi-document update request, providing fully 
detailed feedback to the client about which document-updates failed (and why) 
and which succeeded. {quote}

bq. That's great! But it's also a separate feature that we've needed for a 
while (and I think there has been a JIRA issue open for it for a while).

Yes I agree. See SOLR-3382. The solution I will provide is generic in the way 
that is supports giving feedback to the client about "parts" of a request that 
failed. The client will know that the "parts" which are not reported to have 
failed did succeed. Usefull for any kind of request where parts of it can fail 
and other parts can succeed. The client provide a "partRef" for all parts, and 
partial errors from the server each contain a "partRef" in order to let the 
client know which part(s) failed. I only use it for multi-document (or 
batch-updates as I see you call it in come places) updates (here each document 
is a "part"), but I guess it could also be used for other things like
- Reporting which operations of a multi-operation (e.g. one containing both 
adds, deletes, commits, etc) update that failed.
- Reporting if a single update partly failed (e.g. if using replication and the 
update was successfull on leader but not on one of more of the replicas)
- Etc.

bq. {quote}    Looking shortly at you patch, I belive, that if two threads in 
the server JVM overlaps in a way that is unfortunate enough, then it is 
possible that data will not be stored or will be overwritten without an 
exception being thrown to indicate that to the client. {quote}

bq. I'm confident in the synchronization/concurrency part - it's just reusing 
what was put in place for SolrCloud to handle reordering of updates to replicas 
and is very well tested (see TestRealTimeGet). But please let us know if you 
see a problem with it, as that would mean a problem with SolrCloud today (even 
without this patch).

Looking closer at your code I would like to withdraw my concern about your 
solution. I didnt see that you are doing both "get existing version and check 
against provided version" and "store updated version" inside the same block 
governed by two sync-locks
- A readlock on a ReadWriteLock which is writelocked during deleteByQueries 
(your "vinfo.lockForUpdate"), protecting against concurent deletes (by query)
- A lock on the uniqueKey field (or a hash of it in order to be efficient) of 
the document (your "synchronized (bucket)"), protecting against concurrent 
updates and deletes (by id)

So basically I have repeated that locking-pattern in DirectUpdateHandler2 where 
I chose to place the "unique key constaint"- and "versioning"- check. Guess my 
locking can be removed if it is really necessary to have this kind of locking 
"this far up in the callstack" (in DistributedUpdateProcessor.versionAdd). For 
the purpose of "unique key constraint"- and "versioning"-check I believe it is 
only necessary to have this kind of locking around the actual updates in 
DirectUpdateHandler2 (when leader and not peer_sync) - and the smaller you can 
make synchronized blocks the better. But maybe you have reasons (related to 
SolrCloud) where you need the locking "this far up in the callstack".
But besides that fact that the locking I did in DirectUpdateHandler2 can be 
removed I really believe that the actual checking of "unique key constraint"- 
and "version"-violation logically belong in DirectUpdateHandler2, so I guess I 
will still try to convince you to take that part of my patch in. 
Besides that, I believe your solution just implements what I call 
semantics-mode "classic-consistency-hybrid" on 
http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics. I believe the 
"classic" mode and "consistency" mode are very nice to have in order to provide 
complete backward compabitility (classic mode) and a very strict consistency 
mode where it is not possible to "break the checks" by providing \_version\_ = 
0 (consistency mode).
Finally I believe my solution is a little better with respect to "design" and 
"separation of concerns", by isolatating the rules and rule-checks in a 
separate class instead of just coding it directly inside 
DistributedUpdateProcessor which already deals with a lot of other concerns, 
but that is another discussion, and should not count as the deciding argument.

bq. {quote}    Feedback to client is even possible if using 
ConcurrentSolrServer, because it has been changed to provide a Future from 
"request" - a Future that will eventually provide the calling client with the 
result from the request.{quote}

bq. This sounds like another great feature! It really deserves it's own issue 
so people can more easily provide review/feedback and not have it coupled to 
this issue.

Created the isolated issue - see SOLR-3383

{quote} Some other additions that can be handled later that I see:

    SolrJ support for easier passing of version on a delete, constants for 
version, etc {quote}

Well basically I have moved VERSION_FIELD = "\_version\_" from VersionInfo to 
SolrInputDocument.

bq.    Use of "1" as a generic "exists" version (i.e. update document only if 
it already exists)

Well I started out by wanting that feature (update only if exists, but no exact 
version check), but you couldnt see the good reason for that. Since then I 
realized that you are probably right and have removed that exact possibility - 
of course in classic and classic-consistency-hybrid modes you can still do 
updates without providing the exact version, but there is error if the document 
does not exist.

bq.    If one document in a batch fails, don't automatically abort, and provide 
info back about which docs succeeded and which failed (your first point).

That is part of my solution - basically the thing making it necessary to add 
the ability to send many errors back to the client - SOLR-3382

bq. That last one in particular needs some good discussion and design work and 
really deserves it's own issue.

Agree. Go discuss and comment on SOLR-3382

{quote} Having this current improvement committed in no way blocks any future 
improvements you may come up with (including deleting the code quoted above and 
using whatever method you have come up with for checking the versions), and it 
even uses the same API (or a subset of it) via version and 409. Progress, not 
perfection! {quote}

Agree, so I guess you could go commit, but that would just make my merge-task a 
little bigger, so I hope you will hold your horses a few days more.

bq. Are there any technical objections to this patch?

Except for the things I mention here and there in this response, no. And those 
are all things that could be fixed later, going nicely hand in hand with your 
great philosophy of "progress, not perfection".

bq. It really piggybacks on all of the SolrCloud work done to pass around and 
check versions, so it's really small.

Yes it is small and beautiful

{quote} The API is as follows:

    a) If you provide a version > 0 with an add or a delete, the update will 
fail with a 409 unless the provided version matches exactly the latest in the 
index for the ID.
    b) If you provide a version < 0 with an add or delete, the update will fail 
with a 409 if the ID exists in the index.
{quote}

You always provide the same error (except the versions in the message). You do 
not distinguish between DocumentAlreadyExist, DocumentDoesNotExist and 
VersionConflict errors as I do. Of course the client would know that it is a 
DocumentAlreadyExists if he sent a negative version number and got 409 (at 
least as long as 409 is not used for other kind of conflicts that can also 
happen when version < 0) and that it is DocumentDoesNotExist or VersionConflict 
if he sent a positive version number and got 409 (at least ...). But he really 
doesnt know if it is DocumentDoesNotExist or VersionConflict unless he parses 
the error message, and that might be important to know because he might want to 
react differently - e.g.
- On DocumentDoesNotExist: Either do nothing or go do a create (version < 0) of 
the document
- On VersionConflict: Go refetch the document from Solr, merge in his changes 
(automatically or by the help from a user) and try update again.

Of course you also need to parse error-type in my solution to know exactly what 
the problem is, but you need no "rules" on client side (e.g. knowing that if 
you sent version > 0 and you got version-already-in-store < 0 back, it is 
DocumentDoesNotExist), and in SolrJ my solution automatically parses error-type 
and throw corresponding exceptions.

Ad b) Maybe you dont want an error if you try to delete a document which has 
already been deleted, but probably ok

bq. That's the whole scope of this patch, and I believe is compatible with the 
larger scope of what Per will provide in the future.

Agree!

                
> Versioning - optimistic locking
> -------------------------------
>
>                 Key: SOLR-3178
>                 URL: https://issues.apache.org/jira/browse/SOLR-3178
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 3.5
>         Environment: All
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: RDBMS, insert, locking, nosql, optimistic, uniqueKey, 
> update, versioning
>             Fix For: 4.0
>
>         Attachments: SOLR-3178.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order increase the ability of Solr to be used as a NoSql database (lots of 
> concurrent inserts, updates, deletes and queries in the entire lifetime of 
> the index) instead of just a search index (first: everything indexed (in one 
> thread), after: only queries), I would like Solr to support versioning to be 
> used for optimistic locking.
> When my intent (see SOLR-3173) is to update an existing document, I will need 
> to provide a version-number equal to "the version number I got when I fetched 
> the existing document for update" plus one. If this provided version-number 
> does not correspond to "the newest version-number of that document at the 
> time of update" plus one, I will get a VersionConflict error. If it does 
> correspond the document will be updated with the new one, so that "the newest 
> version-number of that document" is NOW one higher than before the update. 
> Correct but efficient concurrency handling.
> When my intent (see SOLR-3173) is to insert a new document, the version 
> number provided will not be used - instead a version-number 0 will be used. 
> According to SOLR-3173 insert will only succeed if a document with the same 
> value on uniqueKey-field does not already exist.
> In general when talking about different versions of "the same document", of 
> course we need to be able to identify when a document "is the same" - that, 
> per definition, is when the values of the uniqueKey-fields are equal. 
> The functionality provided by this issue is only really meaningfull when you 
> run with "updateLog" activated.
> This issue might be solved more or less at the same time as SOLR-3173, and 
> only one single SVN patch might be given to cover both issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to