We are running DSpace 1.7.2 on solaris with oracle.
After loading a couple of large items,
the nightly filter-media failed to finish.
After removing 2 items after the last one it reported
processing, filter-media completed normally.

The tricky items have large .tar.gz files that are backups of
file systems (3 and 8 Gbytes). I don't think those files are the problem.
We are also loading corresponding directory listings as .txt files.
One such directory listing was 70mbytes, over 750,000 lines.
This means about a million occurrences of 7-8 words and
many others, all in one item.

Basically, we don't need this kind of item indexed.
Is there a way to turn off indexing of an item
or a file within an item?  Are there any other
ways to deal with this?

I know I can probably compress these .txt
directory listings, so if it's a .tar.gz
the indexing will not happen.
But then I need to know the
size limits and check for it before loading.

Another thing about this is that DSpace never frees up the
space after deleting items. I'd really like my
Gbytes back when I delete large items.
It looks like you regain index space after filter-media
re-runs, but the gigage in assetstore is wasted.

-- 
Mark Ludwig
Director of Research Systems Development
University Libraries
SUNY at Buffalo
Buffalo, NY 14260
716 645 5952

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to