Tim, I see your point - those are good reasons to track status separately.
Deadlocks are "mostly harmless" because they are detected and one of the transactions will be restarted. Other than performance impacts and log messages there should be no ill effects. So they are only a problem if they are somewhat frequent. You'll see the activity in ErrorLog.txt if the log level is at Debug or lower. The most common form of deadlock is when two, identical operations run concurrently, both getting a read lock on a shared document (such as a single document with statuses for many other documents) and then each tries to upgrade the read lock to a write lock. While many transactions can share the read lock, only one can have the write lock per normal "readers writer" lock behavior. So TX A tries to get the write lock but is blocked by TX B's read lock. TX B chugs along and tries to also get the write lock, but is blocked by TX A's read lock. That's a cycle, so then they are deadlocked.. One of them will be killed and re-started automatically. Note that the locks are on entire documents (URIs actually) so if a single document is used by many/all transactions, even with different elements for different statuses, there may be lock contention. Yours, Damon -- Damon Feldman Sr. Principal Consultant, MarkLogic From: [email protected] [mailto:[email protected]] On Behalf Of Tim Sent: Sunday, February 24, 2013 6:53 PM To: 'MarkLogic Developer Discussion' Subject: Re: [MarkLogic Dev General] Asyncronous Status Updates Hi Damon, I think the difference here is that although I am moving a document through multiple states, I am keeping a copy of the document at each state for historical purposes. In the past I have accomplished this by creating different directory URIs for an instance of the document at each state. Another reason that I opt to not add the status to the document is that if it gets deleted, I want a trail which is why the status record is useful. I do find CPFs ideal for automated processing of a document that is entered into the database at various stages, but there will also simply be the need for a manual editing environment where the document will not advance in state until the user finally presses a Completed button. I am curious about the lock contention with high update volumes. It seems to me that the problem can present itself even with low transaction volumes, just that the probability is lower. Regarding deadlock, if the status record is being updated by one asynchronous process 1 when another asynchronous process 2 completes and wants to update it, it seems that is going to be handled by the atomicity of each transaction. To prevent programmatic deadlock, I figure I will need to have one status element in my status record for each asynchronous event so I can track them independently. Best, Tim From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Damon Feldman Sent: Sunday, February 24, 2013 2:47 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Asyncronous Status Updates Tim, First, to your particular question: all updates in a single call to MarkLogic will be a single transaction (fully atomic) by default. So if you have a process (synchronous or asynchronous) that updates a document and also updates its status in another document, they will happen atomically (all or nothing). You may find that CPF (content processing framework) already does what you need, however. With CPF you move a document through a set of states to represent a workflow and state transitions trigger asynchronously. CPF has triggers to ensure actions happen, restores its state on system restart automatically (since asynchronous tasks on the task server do not persist across a restart), and tracks state in a properties fragment associated with each document. I tracks any errors or problems in the properties fragment too. As a gotcha, be sure that if you have high update volumes you do not cause lock contention or deadlock-induced retries by updating a single status record for many documents. If, OTOH, you have low transaction volumes, you may want to put the status right on the document after all, since it's simpler but does incur slightly more write overhead. Yours, Damon -- Damon Feldman Sr. Principal Consultant, MarkLogic From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Tim Sent: Saturday, February 23, 2013 3:16 PM To: 'MarkLogic Developer Discussion' Subject: [MarkLogic Dev General] Asyncronous Status Updates Hi Folks, I have a question about best practices for maintaining the state of a document. In a SQL world, I track document statuses using a control table. I find it useful to likewise track status separately from documents via a status record in MarkLogic so that for example, I don't need to update a document when performing quality control. In addition, I can maintain a set of records to track the history of a document and refer to saved instances of the document at each touch point in a workflow where I really do want to retain a copy of the document whenever a change has taken place as referenced by the current state and document URI as well as other important information such as ownership, date/time stamp, etc. However, there are some asynchronous back-end processing actions that can be taken on the document which can be spawned concurrently with updates made to the status table when each completes. I want to make sure that I understand the concurrency issues related to updates top the status record. I think I can assume that there really won't be any need for a locking mechanism, that is that each response will update the status table atomically. I plan to have separate statuses for each of the asynchronous events as the completion of all such statues will indicate that the record is ready for the next stage. Thanks for any suggestions and insight into this! Tim
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
