Hi Mark,

Thanks a lot for chipping in. All that you have described is exactly what 
is happening. To give some more details:

   - There is the single big transaction being built up, and it does not 
   save until all the items in the collection have been processed.
   - My task implementation is annotated with the "@Distributive" 
   annotation as I need for it to only report its status one it has iterated 
   over all of its items (the task implements a custom report method that 
   generates a csv file with items results and emails that to administrators) 
   - I couldn't come up with a better way of doing it. Not having that 
   annotation causes for the individual tasks to report their status, and 
   hence send an email each.
   - In testing when / how commits happened, I was trying to get the 
   individual item task to commit its changes, but as you report, there is the 
   single context and after the first task commit I was getting a SQLException 
   concerning the resultset being closed.

For the time being, until I come up with a better approach to this bulk 
operation, the small change to the DOIConsumer (replace commit with 
uncacheEntity) has helped with the lock issue, and the long task completes 
successfully and all changes are committed to the DB as expected. Do you 
see any issue with this change?

Is there any way of solving the issue with the single context? Curation 
tasks are a good framework for allowing bulk operations, but they should 
allow for changes to be committed frequently in cases where there is no 
dependency between single tasks being run. Although I can see this being 
tricky.

I did try the route of creating a Script using the DSpaceRunnable classes 
which would use an iterator, but I was still facing the issue with the 
transaction lock, and this script was much more innefficient in terms of 
running times.

Best,
Agustina

On Wednesday, 22 March 2023 at 15:41:29 UTC Mark H. Wood wrote:

Interesting. It sounds to me like there are two problems here, with a 
common source: 

o a (potentially) gigantic transaction is built up over a sequence of 
operations which should be unrelated; 

o a (potentially) gigantic queue of events which should be unrelated 
is built up within the curation run's Context. 

It sounds to me as though none of the tasks in the run is committing 
its work. The default curation scope is OPEN, so Curator is not 
committing. The event handler in question commits its own work, and 
in so doing, the first time it is called by the dispatcher, 
inadvertently commits all of the *other* curation work that should 
have been committed already. 

I can find no documentation for this, but it appears that event 
handlers should not commit work. The design of Context seems to 
depend on this. If true, we should document this requirement 
thoroughly. 

++++++++++++++++++ 

It seems to me that a task designed to be run in OPEN scope must 
commit its own work. There is, however, a problem with this: Context 
contains only one Session. A bulk operation needs to keep a 
transaction open while it consumes the sequence of entities on which 
it is to operate, and also to be able to commit work (closing a 
transaction) whenever it modifies one of those entities. I suppose 
that one could work around this by draining the sequence into a List 
and then reloading them one at a time for processing, but how 
inefficient that is! 

-- 
Mark H. Wood 
Lead Technology Analyst 

University Library 
Indiana University - Purdue University Indianapolis 
755 W. Michigan Street 
Indianapolis, IN 46202 
317-274-0749 
www.ulib.iupui.edu 

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/d900ec6f-32bc-4ad1-bba1-573096e091f5n%40googlegroups.com.

Reply via email to