Hi Shawna,

The browse list is generated from the sort_author column in the database (ItemsByAuthor table). This is a non-visible column (as in the displayed list uses the values from author column), and contains a 'normalized' version of the name - which currently just means it is lower cased.

Because in this case, it doesn't start with an unaccented character, it isn't showing up - what you need is for the diactritics to be replaced with an unaccented equivalent when populating the sort_author column - this is done in org/dspace/browse/Browse.java, in itemAdded() - around line 568.

The easiest way to do that at the moment, IMHO, would be to use the ISOLatin1AccentFilter from Lucene:
<http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ISOLatin1AccentFilter.html#removeAccents(java.lang.String)>

So, that line 568 would change from:
row.setColumn("sort_author", value.toLowerCase());
to
row.setColumn("sort_author", ISOLatin1AccentFilter.removeAccents(value.toLowerCase()));

I've attached a patch that will make the necessary changes to Browse.java - this is generated against the current CVS HEAD.

BTW - I tried to do a search for Žekulin in your repository, and it didn't find any results. The search box after submission shows:

Žekulin

has the UTF-8 encoding been set correctly in the Tomcat config:
(from <http://www.dspace.org/technology/system-docs/install.html> - note the URIEncoding="UTF-8" attribute)

<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
<Connector port="8080"
          maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
          enableLookups="false" redirectPort="8443" acceptCount="100"
          connectionTimeout="20000" disableUploadTimeout="true"
          URIEncoding="UTF-8" />

G

----- Original Message ----- From: "Shawna Sadler" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, March 01, 2007 9:09 PM
Subject: [Dspace-tech] Diacritics & Indexing


Hi everyone,
I've run into a new problem, this author has a diacritic on the first
letter of his name: Žekulin, Nicholas G. and it's not being picked up by
the indexing feature.
https://dspace.ucalgary.ca/handle/1880/44267/browse-author

Any suggestions?
Shawna

--
Shawna Sadler
Coordinator, Digital Initiatives
Libraries & Cultural Resources
University of Calgary
Phone: (403) 220-3739
Email: [EMAIL PROTECTED]


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

This email has been scanned by Postini.
For more information please visit http://www.postini.com
### Eclipse Workspace Patch 1.0
#P dspace
Index: src/org/dspace/browse/Browse.java
===================================================================
RCS file: /cvsroot/dspace/dspace/src/org/dspace/browse/Browse.java,v
retrieving revision 1.47
diff -u -r1.47 Browse.java
--- src/org/dspace/browse/Browse.java   12 Sep 2006 11:22:13 -0000      1.47
+++ src/org/dspace/browse/Browse.java   1 Mar 2007 22:12:04 -0000
@@ -56,6 +56,7 @@
import java.util.WeakHashMap;

import org.apache.log4j.Logger;
+import org.apache.lucene.analysis.ISOLatin1AccentFilter;
import org.dspace.content.Collection;
import org.dspace.content.Community;
import org.dspace.content.DCValue;
@@ -562,9 +563,9 @@
                else if ("ItemsByAuthor".equals(table))
                {
                    // author name, and normalized sorting name
-                    // (which for now is simple lower-case)
+                    // lower-case, and replace ISO Latin diacritics
                    row.setColumn("author", value);
-                    row.setColumn("sort_author", value.toLowerCase());
+                    row.setColumn("sort_author", 
ISOLatin1AccentFilter.removeAccents(value.toLowerCase()));
                }
                else if ("ItemsByTitle".equals(table))
                {
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to