Hi,
Could one of the developers or someone else who knows the answer, give me
a short explanation of what happens in index-all when the search indices are
built? What exactly is happening? Where is the data that ends up in the
indices, extracted from (the .txt files I'm assuming)...?? How do your
configuration parameters in dspace.cfg affect the building of the indices?
What are the best parameters to use with org.dspace.search.DSIndexer so that
the least amount of work has to be done to update the indices (it appears that
we may not be using the correct/best parameters with this program in index-all
since it looks like all the files under /dspace/search are being deleted are
completely recreated from scratch each time we run the job, therefore it's
taking index-all 2 ½ days to run (and our repository is getting bigger and
bigger every day).
Another couple of questions...what determines how many files get built in
/dspace/search and what, if anything, should I be able to tell about my search
configuration (in dspace.cfg) by looking at the files under /dspace/search?
We are having some problems with our online search in DSpace 1.4.2. For
one thing, as mentioned above, our index-all cron is taking over 2 days to run
to completion. Is this normal for a repository that currently has
approximately 130,000 items in it? By the way, here are our configuration
parameters for searching DSpace, from dspace.cfg:
##### Search settings #####
# Where to put search index files
search.dir = ${dspace.dir}/search
# Higher values of search.max-clauses will enable prefix searches to work on
# large repositories
# search.max-clauses = 2048
search.max-clauses = 102400
# Which Lucene Analyzer implementation to use. If this is omitted or
# commented out, the standard DSpace analyzer (designed for English)
# is used by default.
# search.analyzer = org.dspace.search.DSAnalyzer
# Chinese analyzer
# search.analyzer = org.apache.lucene.analysis.cn.ChineseAnalyzer
# Boolean search operator to use, current supported values are OR and AND
# If this config item is missing or commented out, OR is used
# AND requires all search terms to be present
# OR requires one or more search terms to be present
search.operator = AND
##### Fulltext Indexing settings #####
# Maximum number of terms indexed for a single field in Lucene.
# Default is 10,000 words - often not enough for full-text indexing.
# If you change this, you'll need to re-index for the change
# to take effect on previously added items.
# -1 = unlimited (Integer.MAX_VALUE)
search.maxfieldlength = -1
##### Fields to Index for Search #####
# DC metadata elements.qualifiers to be indexed for search
# format: - search.index.[number] = [search field]:element.qualifier
# - * used as wildcard
### changing these will change your search results, ###
### but will NOT automatically change your search displays ###
#search.index.1 = author:dc.contributor.*
#search.index.2 = author:dc.creator.*
#search.index.3 = title:dc.title.*
#search.index.4 = keyword:dc.subject.keywords
#search.index.5 = abstract:dc.description.abstract
#search.index.6 = description:dc.description.*
#search.index.7 = identifier:dc.identifier.*
#search.index.6 = author:dc.description.statementofresponsibility
search.index.1 = author:dc.contributor.*
search.index.2 = author:dc.creator.*
search.index.3 = title:dc.title.*
search.index.4 = keyword:dc.subject.*
search.index.5 = abstract:dc.description.abstract
search.index.6 = identifier:dc.identifier.titleControlKey
search.index.7 = series:dc.relation.ispartofseries
search.index.8 = abstract:dc.description.tableofcontents
search.index.9 = mime:dc.format.mimetype
search.index.10 = sponsor:dc.description.sponsorship
search.index.11 = identifier:dc.identifier.*
search.index.12 = language:dc.language.iso
Finally, can anyone point me to some good documentation on the search
parameters in dspace.cfg in 1.4.2 and how to set them in order to maximize your
search capabilities and the integrity of search results?
Thanks in advance,
Sue
Sue Walker-Thornton
ConITS Contract
NASA Langley Research Center
</></>Integrated Library Systems Application & Database Administrator
130 Research Drive
Hampton, VA 23666
Office: (757) 224-4074
Fax: (757) 224-4001
Pager: (757) 988-2547
Email: [email protected] <mailto:[email protected]>
------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech