[ 
http://jira.dspace.org/jira/browse/DS-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11182#action_11182
 ] 

Tim Donohue commented on DS-470:
--------------------------------

>From discussion during DSpace Developers Mtg on Feb 17 2010 ( 
>http://www.duraspace.org/irclogs/index.php?date=2010-02-17 )

[15:37] <tdonohue>  http://jira.dspace.org/jira/browse/DS-470  : Batch import 
times increase drastically as repository size increases; patch to mitigate the 
problem
[15:37] <tdonohue> Oh, this is the large problem that grahamtriggs has been 
helping with
[15:38] <tdonohue> +1 for 1.6.1 or 1.7 :)
...
[15:38] <kshepherd> ah, uh oh.. this issue ;)
...
[15:39] <mhwood> DS-470, +1, post-1.6.0.
[15:39] <grahamtriggs> DS-470: +1 for addressing the issues post-1.6.0. -1 for 
applying the patch as it stands
[15:39] <tdonohue> yea, I think DS-470 will require a bit more 
work/discussion..i'd agree with grahamtriggs
[15:40] <tdonohue> ok, we'll leave it at that for now
[15:40] <kshepherd> DS-470 +1 to the general idea, but graham had some 
reasonable objections to viewing "speeding up batch jobs" as a priority over 
"reducing system load"
[15:40] <mhwood> Good point.

> Batch import times increase drastically as repository size increases; patch 
> to mitigate the problem
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DS-470
>                 URL: http://jira.dspace.org/jira/browse/DS-470
>             Project: DSpace 1.x
>          Issue Type: Improvement
>          Components: DSpace API
>    Affects Versions: 1.6.0
>            Reporter: Simon Brown
>            Priority: Minor
>             Fix For: 1.6.1
>
>         Attachments: batch_importer_speedup.patch, prune.patch
>
>
> As mentioned by my colleague Tom De Mulder on dspace-tech and at 
> http://tdm27.wordpress.com/2010/01/19/dspace-1-6-scalability-testing/ 
> As the repository grows, the time taken for batch imports to run also 
> increases. Having profiled the importer in our 1.6.0-RC1 install we 
> determined that most (80%-90%) of the time was spent in calls to 
> IndexBrowse.pruneIndexes(). 
> The reason for this is that IndexBrowse.indexItem() calls pruneIndexes(), so 
> every time an item is indexed, the indexes are pruned. For any batch of size 
> n, where n > 1, this is (n - 1) times more than is necessary.
> Increasing the visibility of pruneIndexes(), removing the call from 
> IndexBrowse.indexItem(), and making a single call at the end of the 
> BrowseConsumer.end() method reduces this to once per event queue run.
> However, the batch importer calls Context.commit() after each item is 
> imported. Context.commit() runs the event queue, thus causing one event queue 
> run per imported item. 
> This patch addresses both of these issues in a way which has a minimal effect 
> on the rest of the code base; I don't necessarily consider it to be the 
> "best" way, but I wanted to keep the patch small so it could be put out. What 
> it does is:
> 1. create an IndexBrowse.indexItemNoPrune() method, which is called from the 
> BrowseConsumer class instead of indexItem(). Other calls to indexItem() are 
> not affected.
> 2. Call pruneIndexes() from BrowseConsumer.end()
> 3. Change the call in the batch importer from Context.commit() to 
> Context.getDBConnection.commit(). The only effective difference between the 
> two is that the event queue is not run; I think that a better solution might 
> be to move the code to run the event queue from the Context.commit() method 
> to the Context.complete() method, but I don't know what effect that will have 
> on the rest of the code.
> As noted in Tom's blog post linked above, these changes, on a repository with 
> in excess of 120,000 items, brought import time from 4.7 seconds/item down to 
> 4.9 items/second.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Download Intel&reg; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs 
proactively, and fine-tune applications for parallel performance. 
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to