Re: [Dspace-devel] Discovery Instances in DSpace 1.7

Graham Triggs Fri, 05 Nov 2010 12:55:28 -0700

On 5 November 2010 16:10, Mark Diggory <mdigg...@atmire.com> wrote:

> Enabling Discovery as a separate webapplication is possible if we make
> the following concessions:
>
> 1.) Enable the "Discovery Consumer" by default in dspace.cfg and
> accept that all apps including the traditional xmlui use it.
> 2.) Create a second war project under modules called something like
> "xmlui-beta"
> 3.) copy the xmlui.xconf into xmlui-beta/src/main/webapp/WEB-INF
> 4.) Configure the discovery aspects there.
>
> This would give you a second XMLUI instance with discovery enabled
> within it and be usable in the testathon without having to do any of
> the installation steps.
>


I can see the value in having a Discovery enabled instance for testing, but
I stand by my point of the Discovery enabled applications being treated as
an entirely separate repository to that of the standard repository.

Enabling the Discovery consumer in both sets of web applications is skewing
the testing environment, and potentially giving us confusing data (if, for
example, there was a problem with the Discovery Consumer, it's useful to us
to have it isolated between the environments).

Mark
>
> p.s. Peter, I think that we want to consider batch processing and the
> DiscoveryConsumer in changing to autocommit.  Ideally, we would seek a
> configuration that will optimize solr commits when processing a large
> number of items.  Note, until we get Browse completely out of the
> picture, we are stuck with that original problem with Browse
> interfering with batch loading scalability.
>
>
There have been a few improvements in DSpace 1.7 recently. I just ran a test
on my MacBook Pro. My local repository started with an existing 94072 items
already installed.

Running the ItemImport command, over a period of 5 minutes, I was able to
consistently observe ingest rates of between 8 and 12 items per second
(minute intervals of 94722, 95060, 95550, 96249 and 96864 items installed).
This is using Postgres based browse tables and a Lucene search index.

Note that these were metadata only items, although not entirely random - if
you take a look in DSpace trunk, I've added into an org.dspace.testing
package a PubmedToImport class - which will use a SAX parser to spit out
DSpace import format directories from a medline.xml file (you can easily
generate a large file consisting of many thousands of items from
http://www.ncbi.nlm.nih.gov). It's very rough around the edges, and it's not
a complete mapping of the data, but it provides a decent amount of
reasonably 'real world' test data very quickly.

G

------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev

_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Re: [Dspace-devel] Discovery Instances in DSpace 1.7

Reply via email to