I ended up changing some DSpace config parameters and re-running index-init and 
this time it completed successfully, and in less than half the time it took to 
run it before.  I'm a bit worried about the impact of changing these parameters 
and would like some feedback on this.  Here are the changes I made to 
dspace.cfg (we are running version 1.5.1):

search.max-clauses:   The default is 2048.  The documentation says, "Higher 
values of search.max-clauses will enable prefix searches to work on large 
repositories".  Since we have a large repository, and we want it to be 
completely full-text searchable, I had ours set at 200,000.  I now have ours 
set at 4096.  I'm not exactly sure what the impact of this change is though.

search.maxfieldlength:  The default is 10000.  The documentation says, "Maximum 
number of terms indexed for a single field in Lucene. Default is 10,000 words - 
often not enough for full-text indexing.  If you change this, you'll need to 
re-index for the change to take effect on previously added items.  -1 = 
unlimited (Integer.MAX_VALUE)."  Again, since we have a large repository, and 
we want it to be completely full-text searchable, I had ours set at -1 
(unlimited).  I now have it set back at the default - 10000.


After I made these changes, I re-ran index-init and, instead of taking 5-6 days 
to complete, it completed in about a day and a half.  It also did NOT get that 
memory error (below) it got the last 2 times I tried to run it with the old 
search.max-clauses and search.maxfieldlength settings we were using.


Here are our other search parameters in dspace.cfg:
search.operator:    We have ours set at OR.

search.index.?:
##### Fields to Index for Search #####

# DC metadata elements.qualifiers to be indexed for search
# format: - search.index.[number] = [search field]:element.qualifier
#         - * used as wildcard

###      changing these will change your search results,     ###
###  but will NOT automatically change your search displays  ###

search.index.1 = author:dc.contributor.author
search.index.2 = corpauthor:dc.contributor.corpAuthor
search.index.3 = corpauthor:dc.contributor.authorAffiliation
search.index.4 = title:dc.title
search.index.5 = titlecontrolkey:dc.identifier.titleControlKey
search.index.6 = accessionnumber:dc.identifier.accessionNumber
search.index.7 = reportnumber:dc.identifier.reportNumber
search.index.8 = subjectkeyword:dc.subject.keywords


webui.browse.index.?:

webui.browse.index.1 = dateissued:item:dateissued:desc
webui.browse.index.2 = author:metadata:dc.contributor.author:text
webui.browse.index.3 = title:item:title
webui.browse.index.4 = subject:metadata:dc.subject.keywords:text
webui.browse.index.5 = dateaccessioned:item:dateaccessioned:desc
webui.browse.index.6 = corpauthor:metadata:dc.contributor.corpAuthor:text


Since the changes I made resulted in index-init completing much quicker than 
before, and it seems to have gotten rid of the Memory/Out of Swap space error, 
I'm wondering what we lost, if anything, in our search results or if this 
should even be a concern for us.

Any suggestions/advise would be appreciated!
Thanks,
Sue

From: Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY] 
[mailto:[email protected]]
Sent: Saturday, June 19, 2010 7:51 PM
To: [email protected]
Cc: Kimbrough, Glenn W. (LARC-B7)[NCI]; Warren, Douglas Lewis (LARC-B7)[NCI]; 
Smail, James W. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
Subject: [Dspace-tech] java.lang.outOfMemory error trying to run index-init

We have a large repository, currently with 140,376 Items.  Due to user 
complaints about search results, we recently turned off stemming in our DSpace 
1.5.1 search by commenting out the following line in DSAnalyzer.java:

result = new PorterStemFilter(result);

Of course then we had to run index-init to rebuild the search indexes and we've 
been having problems getting the job to finish.  Due to the size of our 
repository, index-init takes about 5 or 6 days to complete and now it's failed 
twice due to the following error:


An unexpected error has been detected by Java Runtime Environment:

#

# java.lang.OutOfMemoryError: requested 655360 bytes for GrET in 
/BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp. Out of 
swap space?

#

#  Internal Error (allocation.inline.hpp:42), pid=23486, tid=5

#  Error: GrET in 
/BUILD_AREA/jdk6_04/hotspot/src/share/vm/utilities/growableArray.cpp

#

# Java VM: Java HotSpot(TM) Server VM (10.0-b19 mixed mode solaris-sparc)

# An error report file with more information is saved as:

# /dspace/hs_err_pid23486.log

#

# If you would like to submit a bug report, please visit:

#   http://java.sun.com/webapps/bugreport/crash.jsp

#

Abort - core dumped



Can someone please help us with this?  This most recent time index-init failed 
was 4½ days into the index rebuild - after indexing 104,082 out of 140,376 
items and now it looks like if we want an accurate and complete index, we're 
going to have to start all over again with the rebuild and there's no guarantee 
it will finish successfully.



Any help would be much appreciated!



I'm attaching the core dump and a copy of our DSRUN to this email.



Thanks in advance,

Sue


Sue Walker-Thornton
NASA Langley Research Center
Integrated Library Systems
Developer, Application & Database Administrator
ConITS Contract ~ NCI Information Systems, Inc.
130 Research Drive
Hampton, VA  23666
Office: (757) 224-4074 ~ Mobile: (757) 506-9903 ~ Fax: (757) 224-4001
email:  [email protected]<mailto:[email protected]>

<<inline: image002.gif>>

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  • [Dspac... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
    • R... Richard Rodgers
      • ... Peter Dietz
        • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
      • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
        • ... Richard Rodgers
          • ... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
    • R... Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]

Reply via email to