Re: [MarkLogic Dev General] Asyncronous Status Updates

Charles Greer Mon, 25 Feb 2013 09:39:39 -0800

Hi Tim,

To follow on what Damon's suggesting-- you might also want to make thestatus documents very small, such that each update is really an insert.We've seen this approach a lot; accessors will aggregate the statusdocuments for examination, but each recorded fact is its own document."A document is a row, not a table." No update locking contention inthis scenario, and you can choose the transaction isolation orsynchronous/asynch to address other requirements in the system.

Partitioning these status documents from the rest of the database, bycollection or directory, would help organize them.`


Charles


On 02/24/2013 05:16 PM, Damon Feldman wrote:

Tim,

I see your point - those are good reasons to track status separately.

Deadlocks are "mostly harmless" because they are detected and one of the 
transactions will be restarted. Other than performance impacts and log messages there 
should be no ill effects. So they are only a problem if they are somewhat frequent. 
You'll see the activity in ErrorLog.txt if the log level is at Debug or lower.

The most common form of deadlock is when two, identical operations run concurrently, both 
getting a read lock on a shared document (such as a single document with statuses for 
many other documents) and then each tries to upgrade the read lock to a write lock. While 
many transactions can share the read lock, only one can have the write lock per normal 
"readers writer" lock behavior. So TX A tries to get the write lock but is 
blocked by TX B's read lock. TX B chugs along and tries to also get the write lock, but 
is blocked by TX A's read lock. That's a cycle, so then they are deadlocked.. One of them 
will be killed and re-started automatically.

Note that the locks are on entire documents (URIs actually) so if a single 
document is used by many/all transactions, even with different elements for 
different statuses, there may be lock contention.
Yours,
Damon

--
Damon Feldman
Sr. Principal Consultant, MarkLogic


From: [email protected] 
[mailto:[email protected]] On Behalf Of Tim
Sent: Sunday, February 24, 2013 6:53 PM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Asyncronous Status Updates

Hi Damon,

I think the difference here is that although I am moving a document through 
multiple states, I am keeping a copy of the document at each state for 
historical purposes.  In the past I have accomplished this by creating 
different directory URIs for an instance of the document at each state.  
Another reason that I opt to not add the status to the document is that if it 
gets deleted, I want a trail which is why the status record is useful.

I do find CPFs ideal for automated processing of a document that is entered 
into the database at various stages, but there will also simply be the need for 
a manual editing environment where the document will not advance in state until 
the user finally presses a Completed button.

I am curious about the lock contention with high update volumes.  It seems to 
me that the problem can present itself even with low transaction volumes, just 
that the probability is lower.  Regarding deadlock, if the status record is 
being updated by one asynchronous process 1 when another asynchronous process 2 
completes and wants to update it, it seems that is going to be handled by the 
atomicity of each transaction.  To prevent programmatic deadlock, I figure I 
will need to have one status element in my status record for each asynchronous 
event so I can track them independently.

Best,

Tim

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Damon Feldman
Sent: Sunday, February 24, 2013 2:47 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Asyncronous Status Updates

Tim,

First, to your particular question: all updates in a single call to MarkLogic 
will be a single transaction (fully atomic) by default. So if you have a 
process (synchronous or asynchronous) that updates a document and also updates 
its status in another document, they will happen atomically (all or nothing).

You may find that CPF (content processing framework) already does what you 
need, however. With CPF you move a document through a set of states to 
represent a workflow and state transitions trigger asynchronously. CPF has 
triggers to ensure actions happen, restores its state on system restart 
automatically (since asynchronous tasks on the task server do not persist 
across a restart), and tracks state in a properties fragment associated with 
each document. I tracks any errors or problems in the properties fragment too.

As a gotcha, be sure that if you have high update volumes you do not cause lock 
contention or deadlock-induced retries by updating a single status record for 
many documents. If, OTOH, you have low transaction volumes, you may want to put 
the status right on the document after all, since it's simpler but does incur 
slightly more write overhead.

Yours,
Damon

--
Damon Feldman
Sr. Principal Consultant, MarkLogic


From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Tim
Sent: Saturday, February 23, 2013 3:16 PM
To: 'MarkLogic Developer Discussion'
Subject: [MarkLogic Dev General] Asyncronous Status Updates

Hi Folks,

I have a question about best practices for maintaining the state of a document. 
 In a SQL world, I track document statuses using a control table.  I find it 
useful to likewise track status separately from documents via a status record 
in MarkLogic so that for example, I don't need to update a document when 
performing quality control.  In addition, I can maintain a set of records to 
track the history of a document and refer to saved instances of the document at 
each touch point in a workflow where I really do want to retain a copy of the 
document whenever a change has taken place as referenced by the current state 
and document URI as well as other important information such as ownership, 
date/time stamp, etc.

However, there are some asynchronous back-end processing actions that can be 
taken on the document which can be spawned concurrently with updates made to 
the status table when each completes.  I want to make sure that I understand 
the concurrency issues related to updates top the status record.  I think I can 
assume that there really won't be any need for a locking mechanism, that is 
that each response will update the status table atomically.   I plan to have 
separate statuses for each of the asynchronous events as the completion of all 
such statues will indicate that the record is ready for the next stage.

Thanks for any suggestions and insight into this!

Tim




_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general


--
Charles Greer
Senior Engineer
MarkLogic Corporation
[email protected]
Phone: +1 707 408 3277
www.marklogic.com

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Asyncronous Status Updates

Reply via email to