Hi,
I was finally able to significantly speed up our online item ingest
process by suppressing the Index pruning on every Item update. After setting
the .log level to DEBUG and tracing the process though, I noticed it was doing
a LOT of updates. This is where it was really slowing down the ingest. This
has been an outstanding problem of ours for a long time now and our Users are
very happy to not have this slow response time anymore, especially when they
click "I grant the license".
Sue
________________________________
From: Graham Triggs [mailto:[email protected]]
Sent: Tuesday, February 23, 2010 2:11 PM
To: Simon Brown
Cc: [email protected] Tech
Subject: Re: [Dspace-tech] Sloooow submission process in DSpace 1.5.1
Hi Simon,
On 23 February 2010 17:47, Simon Brown
<[email protected]<mailto:[email protected]>> wrote:
Sorry to stick my oar in here, but...
Oars are always welcome.
I don't think this is the case. I'm sure it was the intention, but
from what we've been able to determine, each DescribeStep for in-
progress items calls InProgressSubmission.update() which for both
workflow and non-workflow items calls Item.update(), which will fire a
MODIFY_METADATA event for that item. The BrowseConsumer will process
that event whether the item is installed or not. We determined this
after an increasing number of user complaints about the submission
process "slowing down" and added an isArchived() check to our
BrowseConsumer, which made the submissions process noticeably snappier.
Yes, that probably is happening at a BrowseConsumer level (the event mechanism
/ BrowseConsumer was added after this browse code was committed, I'm not 100%
sure of the circumstances of it's use).
However, the BrowseConsumer calls indexItem(), which has the explicit check in
it:
if (item.isArchived() || item.isWithdrawn())
{
indexItem(new ItemMetadataProxy(item));
// Ensure that we remove any invalid entries
pruneIndexes();
}
So, the indexing / pruneIndexing won't happen if the item is not in either
'archive' or 'withdrawn' state - and it shouldn't be in either whilst it is
still in the workspace / workflow. Whilst it passes through the browse indexer,
it shouldn't be doing anything that is expensive (or gets more expensive with
repository size), before installItem() is called.
AFAIK, the BrowseConsumer shouldn't have just an isArchived() check, as that
would prevent indexes being updated correctly when an item is withdrawn. But it
could replicate the if (isArchived() || isWithdrawn()) check, and doing it in
the BrowseConsumer would avoid some overhead that is incurred when IndexBrowse
is created.
G
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech