Hi,
     I was finally able to significantly speed up our online item ingest 
process by suppressing the Index pruning on every Item update.  After setting 
the .log level to DEBUG and tracing the process though, I noticed it was doing 
a LOT of updates.  This is where it was really slowing down the ingest.  This 
has been an outstanding problem of ours for a long time now and our Users are 
very happy to not have this slow response time anymore, especially when they 
click "I grant the license".
Sue

________________________________
From: Graham Triggs [mailto:[email protected]]
Sent: Tuesday, February 23, 2010 2:11 PM
To: Simon Brown
Cc: [email protected] Tech
Subject: Re: [Dspace-tech] Sloooow submission process in DSpace 1.5.1

Hi Simon,
On 23 February 2010 17:47, Simon Brown 
<[email protected]<mailto:[email protected]>> wrote:
Sorry to stick my oar in here, but...

Oars are always welcome.

I don't think this is the case. I'm sure it was the intention, but
from what we've been able to determine, each DescribeStep for in-
progress items calls InProgressSubmission.update() which for both
workflow and non-workflow items calls Item.update(), which will fire a
MODIFY_METADATA event for that item. The BrowseConsumer will process
that event whether the item is installed or not. We determined this
after an increasing number of user complaints about the submission
process "slowing down" and added an isArchived() check to our
BrowseConsumer, which made the submissions process noticeably snappier.

Yes, that probably is happening at a BrowseConsumer level (the event mechanism 
/ BrowseConsumer was added after this browse code was committed, I'm not 100% 
sure of the circumstances of it's use).

However, the BrowseConsumer calls indexItem(), which has the explicit check in 
it:

        if (item.isArchived() || item.isWithdrawn())
        {
            indexItem(new ItemMetadataProxy(item));

            // Ensure that we remove any invalid entries
            pruneIndexes();
        }

So, the indexing / pruneIndexing won't happen if the item is not in either 
'archive' or 'withdrawn' state - and it shouldn't be in either whilst it is 
still in the workspace / workflow. Whilst it passes through the browse indexer, 
it shouldn't be doing anything that is expensive (or gets more expensive with 
repository size), before installItem() is called.

AFAIK, the BrowseConsumer shouldn't have just an isArchived() check, as that 
would prevent indexes being updated correctly when an item is withdrawn. But it 
could replicate the if (isArchived() || isWithdrawn()) check, and doing it in 
the BrowseConsumer would avoid some overhead that is incurred when IndexBrowse 
is created.

G
------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  • [Dspac... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
    • R... Graham Triggs
    • R... Richard, Joel M
      • ... Graham Triggs
        • ... Simon Brown
          • ... Graham Triggs
            • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]

Reply via email to