Just replying to my own email to record the answer.

DSpace 1.8 included an upgrade to Lucene 3.3.0. With 3.3 Lucene seems to 
have changed the default Analyzer to no longer be an english language 
analyser. From the javadocs...

"* ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior 
to 3.1.
  * As of 3.1, {@link StandardAnalyzer} implements Unicode text 
segmentation,
  * as specified by UAX#29."

I think the old behaviour can be reinstated by changing dspace.cfg to 
have...

search.analyzer = org.apache.lucene.analysis.standard.ClassicAnalyzer

Cheers.


On 01/11/12 16:11, TAYLOR Robin wrote:
> Hi all,
>
> I've just been comparing a search at DSpace version 1.6 with a search at
> 1.8 and notice that at 1.8 an apostrophe is treated as a token
> delimiter, so a search term of "O'Connor" is split into "O" and
> "Connor", whereas at 1.6 it was treated as one token. I presume it was a
> conscious change made at some point and I was just wondering when and
> where (in terms of the source code). Its not a problem for me I just
> need to be able to provide an explanation to the repository administrator.
>
> Thanks, Robin.
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to