Hi Mark and Graham,
I am having a problem in DSpace 1.4.2 that I hope you can help with
me. The reason I'm emailing you two is because I saw both of your names
in the doc for DSIndexer.java. Here's the situation:
I happened to notice the other day that our online search results
are frequently not correct for metadata searches (not full-text). I
have an idea of what might be happening, but I'm not quite sure how to
fix it.
These are relevant parameters from dspace.cfg:
##### Search settings #####
# Where to put search index files
search.dir = ${dspace.dir}/search
# Higher values of search.max-clauses will enable prefix searches to
work on
# large repositories
# SMWT - 07/29/2008 - Change begins
# search.max-clauses = 2048
search.max-clauses = 102400
# SMWT - 07/29/2008 - Change ends
# Which Lucene Analyzer implementation to use. If this is omitted or
# commented out, the standard DSpace analyzer (designed for English)
# is used by default.
# search.analyzer = org.dspace.search.DSAnalyzer
# Chinese analyzer
# search.analyzer = org.apache.lucene.analysis.cn.ChineseAnalyzer
# Boolean search operator to use, current supported values are OR and
AND
# If this config item is missing or commented out, OR is used
# AND requires all search terms to be present
# OR requires one or more search terms to be present
search.operator = AND
search.maxfieldlength = -1
search.index.1 = author:dc.contributor.*
search.index.2 = author:dc.creator.*
search.index.3 = title:dc.title.*
search.index.4 = keyword:dc.subject.*
search.index.5 = abstract:dc.description.abstract
search.index.6 = author:dc.description.statementofresponsibility
search.index.7 = series:dc.relation.ispartofseries
search.index.8 = abstract:dc.description.tableofcontents
search.index.9 = mime:dc.format.mimetype
search.index.10 = sponsor:dc.description.sponsorship
search.index.11 = identifier:dc.identifier.*
search.index.12 = language:dc.language.iso
This is our 'index-all' cron:
# Get the DSPACE/bin directory
BINDIR=`dirname $0`
echo "Creating browse index"
$BINDIR/dsrun org.dspace.browse.InitializeBrowse
echo "Creating search index"
$BINDIR/dsrun org.dspace.search.DSIndexer -bcfo
Here is an example of what's happening:
A DSpace Item has metadata = dc.identifier.titleControlKey (this is a
non-standard Dublin Core qualifier we added to metadatafieldregistry
upon implementation) - the value of this field is 'a1120334'. When I do
a simple search on a1120334 I get no results returned. Also when I try
the advanced search, using any/all search types, I still get no results
returned. We run index-all every night, with no errors. The
interesting part of this whole issue is that if I edit the Item using
the Admin UI, and simply press the "Update" button, if I then try the
searches again, both the Simple and Advanced Searches work as expected
and return the correct Item(s).
What this indicates to me is that the Item's metadata - for this Item
(and lots of other items too) - is NOT being written to the Search index
UNTIL the Item is edited and updated online. I've also noticed the
entries in dspace.log after updating the Item online:
2008-12-18 13:40:07,967 INFO org.dspace.search.DSIndexer @ Wrote Item:
2121/26391 to Index
Here are the entries from dspace.log, when I did one of the unsuccessful
searches, prior to updating the Item online:
2008-12-18 13:21:02,440 INFO org.dspace.search.DSQuery @ Final query
string: a1120334
2008-12-18 13:21:02,442 INFO
org.dspace.app.webui.servlet.SimpleSearchServlet @
[email protected]:session_id=8CB824C924B1313A744CF7CD7160D20A:ip
_addr=198.119.152.109:search:query="a1120334",results=(0,0,0)
I was thinking that perhaps my 'search.max-clauses' parameter needs to
be increased since we do have a rather large repository (we currently
have 127,641 Items in our repository and are adding more every day) but
I'm not sure this is the problem and I don't know how high this number
can, or needs to, be.
I've also looked carefully at DSIndexer.java and I'm fairly certain I'm
using the correct command line parameters (-bcfo).
Any help you can give me (especially before our Users notice the search
problem!! :-) ) would certainly be appreciated.
Thanks in advance,
Sue
Sue Walker-Thornton
ConITS Contract
NASA Langley Research Center
</></>Integrated Library Systems Application & Database Administrator
130 Research Drive
Hampton, VA 23666
Office: (757) 224-4074
Fax: (757) 224-4001
Pager: (757) 988-2547
Email: [email protected] <mailto:[email protected]>
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech