Re: [Dspace-tech] suppress indexing a collection

Mark Ludwig Mon, 26 Nov 2012 07:40:37 -0800

Helix-

Thanks for your prompt reply.


You confirmed what I suspected. This project may call for a separate
instance because of the need to separate the indexing.

Re. your question about size and machine resources, this is not
yet a problem in our production DSpace repository. Our greatest bottleneck
is not usage. Usage is minimal because most of our collections are private.
Our greatest bottleneck is the effort required to format input for DSpace
import. Many of our largest collections remain to be loaded because we
just do not have time to program parsing the filenames into DSpace Dublin
core.
With our large collections, the only available metadata is in the filenames
and it just takes too much heads-down scripting and script running to
clean up filenames and produce good Dublin core XML.

Mark

DSpace 1.7.2 xmlui Discovery off.
http://ubir.buffalo.edu

On Mon, Nov 26, 2012 at 6:20 AM, helix84 <[email protected]> wrote:

> On Mon, Nov 26, 2012 at 4:29 AM, Mark Ludwig <[email protected]> wrote:
> > If you have a collection you do not want indexed with your other dspace
> > collections, is there a way to index the collection separately or not at
> > all?
>
> Hi Mark,
>
> please, always include the information which DSpace version, interface
> and theme you're using and whether you're using Discovery.
>
> Unfortunately, you cannot easily exclude a collection from indexing,
> at least not without modifying code. AFAIK, the search and browse
> indexes are not separate, so it's not even possible to have something
> browsable, but not searchable.
>
> What you could do is withdraw items from public display, which would
> make them accessible only to administrators.
>
> The only practical solution I can offer you is to move that content to
> a separate DSpace instance. It can even run on the same server and
> servlet container (and even the same DSpace webapps if you wish), but
> it will have it's own URL, database and assetstore (and possibly
> theme). We currently recommend the same solution (separate instances)
> to separate public vs. dark archives.
>
> > Also,is there any way to block internet crawlers from indexing this one
> > collection in dspace?
>
> Sure, the standard solution is the best - use robots.txt. That will
> work if you move to the separate repository, because within one
> repository, youu can't tell from the URL (which is in handle format)
> which comunity/collection an item belongs to.
>
> > It so happens that this particular collection would be very large,
> > about 750,000 pages as individual documents. Is there a practical
> > point at which a separate dspace instance is appropriate?
>
> That sounds like a moderate size. There are really no intentional
> limits within DSpace, you're restricted only by the amount of RAM and
> CPU cycles. If you hit a problem, you may want to consider using a
> reverse caching proxy (or an army of them for really large
> installations). Do you feel like you're hitting any limits already?
>
>
> Regards,
> ~~helix84
>
> Compulsory reading: DSpace Mailing List Etiquette
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>



-- 
Mark Ludwig
Director of Research Systems Development
University Libraries
SUNY at Buffalo
Buffalo, NY 14260
716 645 5952

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov

_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Re: [Dspace-tech] suppress indexing a collection

Reply via email to