Hi Tim,
To follow on what Damon's suggesting-- you might also want to make the
status documents very small, such that each update is really an insert.
We've seen this approach a lot; accessors will aggregate the status
documents for examination, but each recorded fact is its own document.
"A document is a row, not a table." No update locking contention in
this scenario, and you can choose the transaction isolation or
synchronous/asynch to address other requirements in the system.
Partitioning these status documents from the rest of the database, by
collection or directory, would help organize them.`
Charles
On 02/24/2013 05:16 PM, Damon Feldman wrote:
Tim,
I see your point - those are good reasons to track status separately.
Deadlocks are "mostly harmless" because they are detected and one of the
transactions will be restarted. Other than performance impacts and log messages there
should be no ill effects. So they are only a problem if they are somewhat frequent.
You'll see the activity in ErrorLog.txt if the log level is at Debug or lower.
The most common form of deadlock is when two, identical operations run concurrently, both
getting a read lock on a shared document (such as a single document with statuses for
many other documents) and then each tries to upgrade the read lock to a write lock. While
many transactions can share the read lock, only one can have the write lock per normal
"readers writer" lock behavior. So TX A tries to get the write lock but is
blocked by TX B's read lock. TX B chugs along and tries to also get the write lock, but
is blocked by TX A's read lock. That's a cycle, so then they are deadlocked.. One of them
will be killed and re-started automatically.
Note that the locks are on entire documents (URIs actually) so if a single
document is used by many/all transactions, even with different elements for
different statuses, there may be lock contention.
Yours,
Damon
--
Damon Feldman
Sr. Principal Consultant, MarkLogic
From: [email protected]
[mailto:[email protected]] On Behalf Of Tim
Sent: Sunday, February 24, 2013 6:53 PM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Asyncronous Status Updates
Hi Damon,
I think the difference here is that although I am moving a document through
multiple states, I am keeping a copy of the document at each state for
historical purposes. In the past I have accomplished this by creating
different directory URIs for an instance of the document at each state.
Another reason that I opt to not add the status to the document is that if it
gets deleted, I want a trail which is why the status record is useful.
I do find CPFs ideal for automated processing of a document that is entered
into the database at various stages, but there will also simply be the need for
a manual editing environment where the document will not advance in state until
the user finally presses a Completed button.
I am curious about the lock contention with high update volumes. It seems to
me that the problem can present itself even with low transaction volumes, just
that the probability is lower. Regarding deadlock, if the status record is
being updated by one asynchronous process 1 when another asynchronous process 2
completes and wants to update it, it seems that is going to be handled by the
atomicity of each transaction. To prevent programmatic deadlock, I figure I
will need to have one status element in my status record for each asynchronous
event so I can track them independently.
Best,
Tim
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Damon Feldman
Sent: Sunday, February 24, 2013 2:47 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Asyncronous Status Updates
Tim,
First, to your particular question: all updates in a single call to MarkLogic
will be a single transaction (fully atomic) by default. So if you have a
process (synchronous or asynchronous) that updates a document and also updates
its status in another document, they will happen atomically (all or nothing).
You may find that CPF (content processing framework) already does what you
need, however. With CPF you move a document through a set of states to
represent a workflow and state transitions trigger asynchronously. CPF has
triggers to ensure actions happen, restores its state on system restart
automatically (since asynchronous tasks on the task server do not persist
across a restart), and tracks state in a properties fragment associated with
each document. I tracks any errors or problems in the properties fragment too.
As a gotcha, be sure that if you have high update volumes you do not cause lock
contention or deadlock-induced retries by updating a single status record for
many documents. If, OTOH, you have low transaction volumes, you may want to put
the status right on the document after all, since it's simpler but does incur
slightly more write overhead.
Yours,
Damon
--
Damon Feldman
Sr. Principal Consultant, MarkLogic
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Tim
Sent: Saturday, February 23, 2013 3:16 PM
To: 'MarkLogic Developer Discussion'
Subject: [MarkLogic Dev General] Asyncronous Status Updates
Hi Folks,
I have a question about best practices for maintaining the state of a document.
In a SQL world, I track document statuses using a control table. I find it
useful to likewise track status separately from documents via a status record
in MarkLogic so that for example, I don't need to update a document when
performing quality control. In addition, I can maintain a set of records to
track the history of a document and refer to saved instances of the document at
each touch point in a workflow where I really do want to retain a copy of the
document whenever a change has taken place as referenced by the current state
and document URI as well as other important information such as ownership,
date/time stamp, etc.
However, there are some asynchronous back-end processing actions that can be
taken on the document which can be spawned concurrently with updates made to
the status table when each completes. I want to make sure that I understand
the concurrency issues related to updates top the status record. I think I can
assume that there really won't be any need for a locking mechanism, that is
that each response will update the status table atomically. I plan to have
separate statuses for each of the asynchronous events as the completion of all
such statues will indicate that the record is ready for the next stage.
Thanks for any suggestions and insight into this!
Tim
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
--
Charles Greer
Senior Engineer
MarkLogic Corporation
[email protected]
Phone: +1 707 408 3277
www.marklogic.com
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general