Hi Damon,

 

I think the difference here is that although I am moving a document through
multiple states, I am keeping a copy of the document at each state for
historical purposes.  In the past I have accomplished this by creating
different directory URIs for an instance of the document at each state.
Another reason that I opt to not add the status to the document is that if
it gets deleted, I want a trail which is why the status record is useful.  

 

I do find CPFs ideal for automated processing of a document that is entered
into the database at various stages, but there will also simply be the need
for a manual editing environment where the document will not advance in
state until the user finally presses a Completed button.

 

I am curious about the lock contention with high update volumes.  It seems
to me that the problem can present itself even with low transaction volumes,
just that the probability is lower.  Regarding deadlock, if the status
record is being updated by one asynchronous process 1 when another
asynchronous process 2 completes and wants to update it, it seems that is
going to be handled by the atomicity of each transaction.  To prevent
programmatic deadlock, I figure I will need to have one status element in my
status record for each asynchronous event so I can track them independently.

 

Best,

 

Tim

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Damon Feldman
Sent: Sunday, February 24, 2013 2:47 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Asyncronous Status Updates

 

Tim,

 

First, to your particular question: all updates in a single call to
MarkLogic will be a single transaction (fully atomic) by default. So if you
have a process (synchronous or asynchronous) that updates a document and
also updates its status in another document, they will happen atomically
(all or nothing).

 

You may find that CPF (content processing framework) already does what you
need, however. With CPF you move a document through a set of states to
represent a workflow and state transitions trigger asynchronously. CPF has
triggers to ensure actions happen, restores its state on system restart
automatically (since asynchronous tasks on the task server do not persist
across a restart), and tracks state in a properties fragment associated with
each document. I tracks any errors or problems in the properties fragment
too.

 

As a gotcha, be sure that if you have high update volumes you do not cause
lock contention or deadlock-induced retries by updating a single status
record for many documents. If, OTOH, you have low transaction volumes, you
may want to put the status right on the document after all, since it's
simpler but does incur slightly more write overhead.

 

Yours,

Damon

 

--

Damon Feldman

Sr. Principal Consultant, MarkLogic

 

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Tim
Sent: Saturday, February 23, 2013 3:16 PM
To: 'MarkLogic Developer Discussion'
Subject: [MarkLogic Dev General] Asyncronous Status Updates

 

Hi Folks,

 

I have a question about best practices for maintaining the state of a
document.  In a SQL world, I track document statuses using a control table.
I find it useful to likewise track status separately from documents via a
status record in MarkLogic so that for example, I don't need to update a
document when performing quality control.  In addition, I can maintain a set
of records to track the history of a document and refer to saved instances
of the document at each touch point in a workflow where I really do want to
retain a copy of the document whenever a change has taken place as
referenced by the current state and document URI as well as other important
information such as ownership, date/time stamp, etc.

 

However, there are some asynchronous back-end processing actions that can be
taken on the document which can be spawned concurrently with updates made to
the status table when each completes.  I want to make sure that I understand
the concurrency issues related to updates top the status record.  I think I
can assume that there really won't be any need for a locking mechanism, that
is that each response will update the status table atomically.   I plan to
have separate statuses for each of the asynchronous events as the completion
of all such statues will indicate that the record is ready for the next
stage.

 

Thanks for any suggestions and insight into this!

 

Tim

 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to