On Mon, Nov 26, 2012 at 4:29 AM, Mark Ludwig <[email protected]> wrote:
> If you have a collection you do not want indexed with your other dspace
> collections, is there a way to index the collection separately or not at
> all?

Hi Mark,

please, always include the information which DSpace version, interface
and theme you're using and whether you're using Discovery.

Unfortunately, you cannot easily exclude a collection from indexing,
at least not without modifying code. AFAIK, the search and browse
indexes are not separate, so it's not even possible to have something
browsable, but not searchable.

What you could do is withdraw items from public display, which would
make them accessible only to administrators.

The only practical solution I can offer you is to move that content to
a separate DSpace instance. It can even run on the same server and
servlet container (and even the same DSpace webapps if you wish), but
it will have it's own URL, database and assetstore (and possibly
theme). We currently recommend the same solution (separate instances)
to separate public vs. dark archives.

> Also,is there any way to block internet crawlers from indexing this one
> collection in dspace?

Sure, the standard solution is the best - use robots.txt. That will
work if you move to the separate repository, because within one
repository, youu can't tell from the URL (which is in handle format)
which comunity/collection an item belongs to.

> It so happens that this particular collection would be very large,
> about 750,000 pages as individual documents. Is there a practical
> point at which a separate dspace instance is appropriate?

That sounds like a moderate size. There are really no intentional
limits within DSpace, you're restricted only by the amount of RAM and
CPU cycles. If you hit a problem, you may want to consider using a
reverse caching proxy (or an army of them for really large
installations). Do you feel like you're hitting any limits already?


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to