You could always try editing /dspace/bin/dsrun's java-opt memory limit to
something higher (depending on resources available on the machine).


java -Xmx*768*m -classpath $FULLPATH "$@"


Setting this to something high has gotten me through some large one-time
tasks such as stats-log-convertor where I would otherwise, eventually hit a
java heap space error.

Peter Dietz
Systems Developer/Engineer
Ohio State University Libraries



On Mon, Jun 21, 2010 at 1:50 PM, Richard Rodgers <[email protected]> wrote:

> Hi Sue:
>
> I don't have any immediate help, but I'm struck by how long the indexing
> job is taking. I had a comparison done with one of our DSpace 1.6
> repositories which is about half the size of yours
> (71,481 items), and is mostly text-based content (which I think yours is
> also?) On not particularly fast hardware, a complete re-index took about 5
> hours - not 5 days.
>
> There may be some subtle limit in the code based on size - so to get
> started, I did a 'profile' of our repo with respect to full-text content
> (which I am assuming accounts for most of the indexing time - but I could be
> wrong). Here is the 'profile' and the queries we used to get it. I'd be
> interested to see what your repo looks like using the same metrics.
>
> count of items
>                 71,481
> count of bitstreams in text extract bundles (TEXT):              89,993
> sum of all file sizes in text extract bundles:                7,695,414,829
> average size of text extract  bitstream:
>                                    85,511
>
> Queries used:
>
> select count(bs.bitstream_id)
> from bundle b, bundle2bitstream b2b, bitstream bs
> where b2b.bundle_id = b.bundle_id and b2b.bitstream_id = bs.bitstream_id
> and b.name = 'TEXT'
>
>
> select sum(bs.size_bytes)
> from bundle b, bundle2bitstream b2b, bitstream bs
> where b2b.bundle_id = b.bundle_id and b2b.bitstream_id = bs.bitstream_id
> and b.name = 'TEXT'
>
> Thanks,
>
> Richard
>
>
> On Jun 19, 2010, at 7:50 PM, Thornton, Susan M. (LARC-B702)[RAYTHEON
> TECHNICAL SERVICES COMPANY] wrote:
>
> We have a large repository, currently with 140,376 Items.  Due to user
> complaints about search results, we recently turned off stemming in our
> DSpace 1.5.1 search by commenting out the following line in DSAnalyzer.java:
>
> *result = new PorterStemFilter(result);*
>
> Of course then we had to run index-init to rebuild the search indexes and
> we’ve been having problems getting the job to finish.  Due to the size of
> our repository, index-init takes about 5 or 6 days to complete and now it’s
> failed twice due to the following error:
>
> *An unexpected error has been detected by Java Runtime Environment:*
> *#*
> *# java.lang.OutOfMemoryError: requested 655360 bytes for GrET in
> /BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp. Out of
> swap space?*
> *#*
> *#  Internal Error (allocation.inline.hpp:42), pid=23486, tid=5*
> *#  Error: GrET in
> /BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp*
> *#*
> *# Java VM: Java HotSpot(TM) Server VM (10.0-b19 mixed mode solaris-sparc)
> *
> *# An error report file with more information is saved as:*
> *# /dspace/hs_err_pid23486.log*
> *#*
> *# If you would like to submit a bug report, please visit:*
> *#   http://java.sun.com/webapps/bugreport/crash.jsp*
> *#*
> *Abort - core dumped*
> * *
> Can someone please help us with this?  This most recent time index-init
> failed was 4½ days into the index rebuild – after indexing 104,082 out of
> 140,376 items and now it looks like if we want an accurate and complete
> index, we’re going to have to start all over again with the rebuild and
> there’s no guarantee it will finish successfully.
>
> Any help would be much appreciated!
>
> I’m attaching the core dump and a copy of our DSRUN to this email.
>
> Thanks in advance,
> Sue
>
>
> *Sue Walker-Thornton***
> *NASA Langley Research Center**
> **Integrated Library Systems*
> *Developer, Application & Database Administrator*
> *ConITS Contract ~ NCI Information Systems, Inc.*
> *130 Research Drive*
> *Hampton, VA  23666*
> *Office: (757) 224-4074 ~ Mobile: (757) 506-9903 ~ Fax: (757) 224-4001 *
> *email:  **[email protected]* <[email protected]>
>
> <hs_err_pid23486.log><ATT00001.c><ATT00002.c>
>
>
>
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit.  See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  • [Dspac... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
    • R... Richard Rodgers
      • ... Peter Dietz
        • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
      • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
        • ... Richard Rodgers
          • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
    • R... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]

Reply via email to