Hi Mark and Graham,

     I am having a problem in DSpace 1.4.2 that I hope you can help with
me.  The reason I'm emailing you two is because I saw both of your names
in the doc for DSIndexer.java.  Here's the situation:

 

     I happened to notice the other day that our online search results
are frequently not correct for metadata searches (not full-text).  I
have an idea of what might be happening, but I'm not quite sure how to
fix it.

 

These are relevant parameters from dspace.cfg:

 

##### Search settings #####

 

# Where to put search index files

search.dir = ${dspace.dir}/search

 

# Higher values of search.max-clauses will enable prefix searches to
work on

# large repositories

# SMWT - 07/29/2008 - Change begins

# search.max-clauses = 2048

search.max-clauses = 102400

# SMWT - 07/29/2008 - Change ends

 

# Which Lucene Analyzer implementation to use.  If this is omitted or

# commented out, the standard DSpace analyzer (designed for English)

# is used by default.

# search.analyzer = org.dspace.search.DSAnalyzer

 

# Chinese analyzer

# search.analyzer = org.apache.lucene.analysis.cn.ChineseAnalyzer

 

# Boolean search operator to use, current supported values are OR and
AND

# If this config item is missing or commented out, OR is used

# AND requires all search terms to be present

# OR requires one or more search terms to be present

search.operator = AND

 

search.maxfieldlength = -1

 

search.index.1 = author:dc.contributor.*

search.index.2 = author:dc.creator.*

search.index.3 = title:dc.title.*

search.index.4 = keyword:dc.subject.*

search.index.5 = abstract:dc.description.abstract

search.index.6 = author:dc.description.statementofresponsibility   

search.index.7 = series:dc.relation.ispartofseries

search.index.8 = abstract:dc.description.tableofcontents

search.index.9 = mime:dc.format.mimetype

search.index.10 = sponsor:dc.description.sponsorship

search.index.11 = identifier:dc.identifier.*

search.index.12 = language:dc.language.iso

 

 

 

This is our 'index-all' cron:

 

# Get the DSPACE/bin directory

BINDIR=`dirname $0`

 

echo "Creating browse index"

$BINDIR/dsrun org.dspace.browse.InitializeBrowse

 

echo "Creating search index"

$BINDIR/dsrun org.dspace.search.DSIndexer -bcfo

 

 

Here is an example of what's happening:

A DSpace Item has metadata = dc.identifier.titleControlKey    (this is a
non-standard Dublin Core qualifier we added to metadatafieldregistry
upon implementation) - the value of this field is 'a1120334'.  When I do
a simple search on a1120334 I get no results returned.  Also when I try
the advanced search, using any/all search types, I still get no results
returned.  We run index-all every night, with no errors.  The
interesting part of this whole issue is that if I edit the Item using
the Admin UI, and simply press the "Update" button, if I then try the
searches again, both the Simple and Advanced Searches work as expected
and return the correct Item(s).

 

What this indicates to me is that the Item's metadata - for this Item
(and lots of other items too) - is NOT being written to the Search index
UNTIL the Item is edited and updated online.  I've also noticed the
entries in dspace.log after updating the Item online:

2008-12-18 13:40:07,967 INFO  org.dspace.search.DSIndexer @ Wrote Item:
2121/26391 to Index

 

Here are the entries from dspace.log, when I did one of the unsuccessful
searches, prior to updating the Item online:

2008-12-18 13:21:02,440 INFO  org.dspace.search.DSQuery @ Final query
string: a1120334

2008-12-18 13:21:02,442 INFO
org.dspace.app.webui.servlet.SimpleSearchServlet @
[email protected]:session_id=8CB824C924B1313A744CF7CD7160D20A:ip
_addr=198.119.152.109:search:query="a1120334",results=(0,0,0)

 

I was thinking that perhaps my 'search.max-clauses' parameter needs to
be increased since we do have a rather large repository (we currently
have 127,641 Items in our repository and are adding more every day) but
I'm not sure this is the problem and I don't know how high this number
can, or needs to, be.

 

I've also looked carefully at DSIndexer.java and I'm fairly certain I'm
using the correct command line parameters (-bcfo).

 

Any help you can give me (especially before our Users notice the search
problem!! :-) ) would certainly be appreciated.

 

Thanks in advance,

Sue

 

Sue Walker-Thornton

ConITS Contract
NASA Langley Research Center
</></>Integrated Library Systems Application & Database Administrator

130 Research Drive

Hampton, VA  23666

Office: (757) 224-4074
Fax:    (757) 224-4001
Pager: (757) 988-2547 
Email:  [email protected] <mailto:[email protected]> 

 

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to