Hi,

     Could one of the developers or someone else who knows the answer, give me 
a short explanation of what happens in index-all when the search indices are 
built?  What exactly is happening?  Where is the data that ends up in the 
indices, extracted from (the .txt files I'm assuming)...??  How do your 
configuration parameters in dspace.cfg affect the building of the indices?  
What are the best parameters to use with org.dspace.search.DSIndexer so that 
the least amount of work has to be done to update the indices (it appears that 
we may not be using the correct/best parameters with this program in index-all 
since it looks like all the files under /dspace/search are being deleted are 
completely recreated from scratch each time we run the job, therefore it's 
taking index-all 2 ½ days to run (and our repository is getting bigger and 
bigger every day).

 

     Another couple of questions...what determines how many files get built in 
/dspace/search and what, if anything, should I be able to tell about my search 
configuration (in dspace.cfg) by looking at the files under /dspace/search?

 

     We are having some problems with our online search in DSpace 1.4.2.  For 
one thing, as mentioned above, our index-all cron is taking over 2 days to run 
to completion.  Is this normal for a repository that currently has 
approximately 130,000 items in it?  By the way, here are our configuration 
parameters for searching DSpace, from dspace.cfg:

 

##### Search settings #####

 

# Where to put search index files

search.dir = ${dspace.dir}/search

 

# Higher values of search.max-clauses will enable prefix searches to work on

# large repositories

# search.max-clauses = 2048

search.max-clauses = 102400

 

# Which Lucene Analyzer implementation to use.  If this is omitted or

# commented out, the standard DSpace analyzer (designed for English)

# is used by default.

# search.analyzer = org.dspace.search.DSAnalyzer

 

# Chinese analyzer

# search.analyzer = org.apache.lucene.analysis.cn.ChineseAnalyzer

 

# Boolean search operator to use, current supported values are OR and AND

# If this config item is missing or commented out, OR is used

# AND requires all search terms to be present

# OR requires one or more search terms to be present

search.operator = AND

 

 ##### Fulltext Indexing settings #####

# Maximum number of terms indexed for a single field in Lucene.

# Default is 10,000 words - often not enough for full-text indexing.

# If you change this, you'll need to re-index for the change

# to take effect on previously added items.

# -1 = unlimited (Integer.MAX_VALUE)

search.maxfieldlength = -1

 

##### Fields to Index for Search #####

 

# DC metadata elements.qualifiers to be indexed for search

# format: - search.index.[number] = [search field]:element.qualifier

#         - * used as wildcard

 

###      changing these will change your search results,     ###

###  but will NOT automatically change your search displays  ###

 

#search.index.1 = author:dc.contributor.*

#search.index.2 = author:dc.creator.*

#search.index.3 = title:dc.title.*

#search.index.4 = keyword:dc.subject.keywords

#search.index.5 = abstract:dc.description.abstract

#search.index.6 = description:dc.description.*                 

#search.index.7 = identifier:dc.identifier.*     

#search.index.6 = author:dc.description.statementofresponsibility               
      

 

search.index.1 = author:dc.contributor.*

search.index.2 = author:dc.creator.*

search.index.3 = title:dc.title.*

search.index.4 = keyword:dc.subject.*

search.index.5 = abstract:dc.description.abstract

search.index.6 = identifier:dc.identifier.titleControlKey 

search.index.7 = series:dc.relation.ispartofseries

search.index.8 = abstract:dc.description.tableofcontents

search.index.9 = mime:dc.format.mimetype

search.index.10 = sponsor:dc.description.sponsorship

search.index.11 = identifier:dc.identifier.*

search.index.12 = language:dc.language.iso

 

     Finally, can anyone point me to some good documentation on the search 
parameters in dspace.cfg in 1.4.2 and how to set them in order to maximize your 
search capabilities and the integrity of search results?

 

Thanks in advance,

Sue

 

Sue Walker-Thornton

ConITS Contract
NASA Langley Research Center
</></>Integrated Library Systems Application & Database Administrator

130 Research Drive

Hampton, VA  23666

Office: (757) 224-4074
Fax:    (757) 224-4001
Pager: (757) 988-2547 
Email:  [email protected] <mailto:[email protected]> 

 

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
  • [Dspace-tech] DSpa... Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS]

Reply via email to