[
https://jira.duraspace.org/browse/DS-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Graham Triggs resolved DS-470.
------------------------------
Resolution: Duplicate
> Batch import times increase drastically as repository size increases; patch
> to mitigate the problem
> ---------------------------------------------------------------------------------------------------
>
> Key: DS-470
> URL: https://jira.duraspace.org/browse/DS-470
> Project: DSpace
> Issue Type: Improvement
> Components: DSpace API
> Affects Versions: 1.6.0
> Reporter: Simon Brown
> Fix For: 1.7.0
>
> Attachments: batch_importer_speedup.patch, prune.patch
>
>
> As mentioned by my colleague Tom De Mulder on dspace-tech and at
> http://tdm27.wordpress.com/2010/01/19/dspace-1-6-scalability-testing/
> As the repository grows, the time taken for batch imports to run also
> increases. Having profiled the importer in our 1.6.0-RC1 install we
> determined that most (80%-90%) of the time was spent in calls to
> IndexBrowse.pruneIndexes().
> The reason for this is that IndexBrowse.indexItem() calls pruneIndexes(), so
> every time an item is indexed, the indexes are pruned. For any batch of size
> n, where n > 1, this is (n - 1) times more than is necessary.
> Increasing the visibility of pruneIndexes(), removing the call from
> IndexBrowse.indexItem(), and making a single call at the end of the
> BrowseConsumer.end() method reduces this to once per event queue run.
> However, the batch importer calls Context.commit() after each item is
> imported. Context.commit() runs the event queue, thus causing one event queue
> run per imported item.
> This patch addresses both of these issues in a way which has a minimal effect
> on the rest of the code base; I don't necessarily consider it to be the
> "best" way, but I wanted to keep the patch small so it could be put out. What
> it does is:
> 1. create an IndexBrowse.indexItemNoPrune() method, which is called from the
> BrowseConsumer class instead of indexItem(). Other calls to indexItem() are
> not affected.
> 2. Call pruneIndexes() from BrowseConsumer.end()
> 3. Change the call in the batch importer from Context.commit() to
> Context.getDBConnection.commit(). The only effective difference between the
> two is that the event queue is not run; I think that a better solution might
> be to move the code to run the event queue from the Context.commit() method
> to the Context.complete() method, but I don't know what effect that will have
> on the rest of the code.
> As noted in Tom's blog post linked above, these changes, on a repository with
> in excess of 120,000 items, brought import time from 4.7 seconds/item down to
> 4.9 items/second.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.duraspace.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
Spend less time writing and rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel