Hi Annette,

 We had similar issues with diacritics some time ago and hope to remember the issue or at least help you to track it down.

Sorry if i did not get it right, but I am not sure if the problem is at searching or at clicking at some author line at browsing by authors. They are similar but internally are not completely the same process.

When searching (search box) for Schmüser does it find it ok?. This is done against lucene indexes and not against dspace db. The indexing process should strip all special chars and store the words in lowercase in the index files (i.e. Schmüser is stored as schmuser, that is why you find it also that way)  If you want to be sure or take a have a look at index files, use luke (http://www.getopt.org/luke/) and point it to dspace_install_dir/search and see how author names are stored for indexing.

The process of find an author through author list is a bit different. Internally it uses a table at dspace bd (in my case BI_1_DIS) where author and normalized version of author are stored. Take a look at it and check that authors are correctly normalized (i.e the column sort_value holds names in lowercase without special chars or diacritics)

In both cases, the same process of stripping diacritics and special chars must be done (internally) before comparing the word against the index or db. I think the problem is there.

Some points to check:

1- At dspace config (dspace.cfg) :

Take a look at search settings,  you probably have a line as

search.analyzer = org.dspace.search.DSAnalyzer

Also at Browse Configuration there is a block

# Set the options for how the indexes are sorted
#
# All sort normalisations are carried out by the OrderFormatDelegate.
# The plugin manager can be used to specify your own delegates for each datatype.
#
# The default datatypes (and delegates) are:

author = es.upna.dspace.upnaSort

# title  = org.dspace.sort.OrderFormatTitle
# text   = org.dspace.sort.OrderFormatText

You should check that org.dspace.search.DSAnalyzer and (in my case) es.upna.dspace.upnaSort are doing the same stripping. If not, the comparison will not be done with the same criteria (normalized author against a not normalized string). If you do not specify a author=class you are defaulting to dpace mechanism for authors (i think org.dspace.sort.OrderFormatAuthor), problably not right in your case.


2 - I think it is not the case, but just in case, when you search for Schmüser, although there are no results, be sure it is displayed correctly as Schmüser at result page (in the search box) If not, you are having some encoding problem at the environment or app configuration (OS,tomcat,...) Also check that both pages use UTF-8 as encoding.

Hope to have recalled it well and it serves you to find the issue.

Carlos




El 05/07/2011 17:18, Ramsden, Annette escribió:

 

 

 

 

 

Sorry to bother the list again, but we still cannot get special characters/diacritics to link. They are displayed, however if you try to link through the subject/keyword it comes up with the following;

 

Can anyone suggest anything else to try?

 

Thanks

Annette

 

 

-----Original Message-----
From: Claudia Jürgen [mailto:[email protected]]
Sent: 18, February, 2011 14:55
To: [email protected]
Subject: Re: [Dspace-general] Special characters/diacritics

 

Hello,

 

if you are using tomcat check the encoding setting for the connector in

server.xml

 

See chapter

3.2.6. Servlet Engine: (Jakarta Tomcat 4.x, Jetty, Caucho

Resin or equivalent)

 

of the 1.6.1 system documentation

http://www.dspace.org/1_6_1Documentation/DSpace-Manual.pdf

 

"Modifications in /[tomcat]/conf/server.xml :

 

You also need to alter Tomcat's default configuration to support

searching and browsing of multi-byte UTF-8 correctly. You

need to add a configuration option to the <Connector> element in

[tomcat]/config/server.xml:

URIEncoding="UTF-8"

e.g. if you're using the default Tomcat config, it should read:

<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->

<Connector port="8080"

maxThreads="150"

minSpareThreads="25"

maxSpareThreads="75"

enableLookups="false"

redirectPort="8443"

acceptCount="100"

connectionTimeout="20000"

disableUploadTimeout="true"

URIEncoding="UTF-8"/>

"

 

Hope that helps

 

Claudia Jürgen

 

 

Am 18.02.2011 15:28, schrieb Ramsden, Annette:

> Dear all

> 

> We are operating D-Space 1.6.1. I wondered whether anyone else had a

> problem with diacritics/special characters? When we cut and paste

> names with umlauts or accents, such as Schmüser, the name is visible

> in the repository but brings up an error message when accessed and

> will not link to the record. If we remove the diacritic&  spell the

> name without these, it does work. Also we have an item which has TM

> in its title&  this will not link to the item even though the record

> is displayed. Is there anything that can be done? Has anyone else

> encountered this and found a fix?

> 

> Thanks Annette

> 

> Assistant Academic Librarian Information Services University of

> Abertay Dundee

> 

> 

> 

> 

> 

> 

> ------------------------------------------------------------------------------

> 

> 

The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:

> Pinpoint memory and threading errors before they happen. Find and fix

> more than 250 security defects in the development cycle. Locate

> bottlenecks in serial and parallel code that limit performance.

> http://p.sf.net/sfu/intel-dev2devfeb

> 

> 

> 

> _______________________________________________ Dspace-general

> mailing list [email protected]

> https://lists.sourceforge.net/lists/listinfo/dspace-general

 

--

Claudia Juergen

Universitaetsbibliothek Dortmund

Eldorado

0231/755-4043

https://eldorado.tu-dortmund.de/

 




-- 

Carlos Alonso Vega
Universidad Publica de Navarra
Servicio Informatico - Servicios Campus
Campus Arrosadia
31006 Pamplona - Spain
+34 948 16 96 52 [email protected]

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to