My feeling is that a lot could be gained by writing changes to a temp file
and only swapping it into place if successful.
Michael
On Nov 16, 2011 3:26 PM, "Stephen Bayliss" <
stephen.bayl...@acuityunlimited.net> wrote:
> **
> This is following on from a conversation we had at yesterday's committer
> meeting where Eddie mentioned he had a scenario with some ingests failing
> under heavy load leading to potential inconsistences eg between the
> registry and the object and datastream store.
>
> I think there are two separate types of problem here:
>
> a) being able to debug to isolate the cause of a failure
> b) being able to fix the cause of the failure itself
>
> I have had cause recently to look again at DefaultDOManager.doCommit(...)
> - which is essentially where a new/modified/deleted digital object is
> committed to storage.
>
> Some observations here are that (aside from there being a lot of code -
> 300-odd lines in a single method), the structure and error reporting makes
> it difficult to determine genuine causes of concern from situations like
> "this looks a bit odd but is probably ok".
>
> An example:
>
> If a commit fails (indicated by an exception being thrown), a tidy-up is
> executed by re-invoking doCommit(...) with "remove" set to true.
> Essentially this is a purge. If anything fails on the purge, a warning
> message is logged (including in some cases the test "but that might be
> ok"). It needs to be a warning as if the ingest failed part-way through
> then some things won't be there as expected to clean up.
>
> I think there's a big difference in cleaning up after a failed operation
> and doing a purge. If a purge fails to remove something that was expected
> then that should be logged as an error; but as this code is used for both
> clean-up and purging it's not possible to distinguish between the two.
>
> Just one example - but it highlights (a) - being able to debug. The
> logging is not useful in terms of indicating genuine error conditions.
>
> I think we could do some beneficial refactoring of the existing code which
> hopefully would not risk changing existing functionality to better
> distinguish genuine error conditions.
>
> It would be useful if the various storage components (datastreams, foxml,
> resource index, registry) were wrapped within some basic
> transactioning/rollback capabilities - so that any cleanup code knows what
> it should be cleaning up (rather than attempting to clean up everything);
> and then anything that couldn't be cleaned up can be logged as an error.
> Similarly any code that tries to persist a modification to one of the
> storage components but fails should be logged as an error rather than a
> warning.
>
> There is potential for making things better in this code in general - I
> can see there could be other situations leading to an inconsistency. For
> example if managed content datastreams are sucessfully updated (or new
> versions added), but for instance the resource index update fails, then the
> foxml won't be updated and nor will the registry but the managed content
> updates will have already been persisted; so we have potential for an
> inconsistency. Again I'm not sure that the current exception handling and
> logging is of that much help in identifying problematic situations.
>
> I think we should focus on doing what we can to improve (a) in the first
> instance, if possible without disturbing too much of the logic already
> embodied in this peice of code. And to try and do this without negatively
> impacting performance given the focus of 3.6.
>
> Steve
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Fedora-commons-developers mailing list
> Fedora-commons-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers