Hi Sue:
I don't have any immediate help, but I'm struck by how long the indexing job is
taking. I had a comparison done with one of our DSpace 1.6 repositories which
is about half the size of yours
(71,481 items), and is mostly text-based content (which I think yours is also?)
On not particularly fast hardware, a complete re-index took about 5 hours - not
5 days.
There may be some subtle limit in the code based on size - so to get started, I
did a 'profile' of our repo with respect to full-text content (which I am
assuming accounts for most of the indexing time - but I could be wrong). Here
is the 'profile' and the queries we used to get it. I'd be interested to see
what your repo looks like using the same metrics.
count of items
71,481
count of bitstreams in text extract bundles (TEXT): 89,993
sum of all file sizes in text extract bundles: 7,695,414,829
average size of text extract bitstream:
85,511
Queries used:
select count(bs.bitstream_id)
from bundle b, bundle2bitstream b2b, bitstream bs
where b2b.bundle_id = b.bundle_id and b2b.bitstream_id = bs.bitstream_id
and b.name<http://b.name/> = 'TEXT'
select sum(bs.size_bytes)
from bundle b, bundle2bitstream b2b, bitstream bs
where b2b.bundle_id = b.bundle_id and b2b.bitstream_id = bs.bitstream_id
and b.name<http://b.name/> = 'TEXT'
Thanks,
Richard
On Jun 19, 2010, at 7:50 PM, Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL
SERVICES COMPANY] wrote:
We have a large repository, currently with 140,376 Items. Due to user
complaints about search results, we recently turned off stemming in our DSpace
1.5.1 search by commenting out the following line in DSAnalyzer.java:
result = new PorterStemFilter(result);
Of course then we had to run index-init to rebuild the search indexes and we’ve
been having problems getting the job to finish. Due to the size of our
repository, index-init takes about 5 or 6 days to complete and now it’s failed
twice due to the following error:
An unexpected error has been detected by Java Runtime Environment:
#
# java.lang.OutOfMemoryError: requested 655360 bytes for GrET in
/BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp. Out of
swap space?
#
# Internal Error (allocation.inline.hpp:42), pid=23486, tid=5
# Error: GrET in
/BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp
#
# Java VM: Java HotSpot(TM) Server VM (10.0-b19 mixed mode solaris-sparc)
# An error report file with more information is saved as:
# /dspace/hs_err_pid23486.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#
Abort - core dumped
Can someone please help us with this? This most recent time index-init failed
was 4½ days into the index rebuild – after indexing 104,082 out of 140,376
items and now it looks like if we want an accurate and complete index, we’re
going to have to start all over again with the rebuild and there’s no guarantee
it will finish successfully.
Any help would be much appreciated!
I’m attaching the core dump and a copy of our DSRUN to this email.
Thanks in advance,
Sue
Sue Walker-Thornton
NASA Langley Research Center
Integrated Library Systems
Developer, Application & Database Administrator
ConITS Contract ~ NCI Information Systems, Inc.
130 Research Drive
Hampton, VA 23666
Office: (757) 224-4074 ~ Mobile: (757) 506-9903 ~ Fax: (757) 224-4001
email: [email protected]<mailto:[email protected]>
<hs_err_pid23486.log><ATT00001.c><ATT00002.c>
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech