Hi Shawna,
The browse list is generated from the sort_author column in the database
(ItemsByAuthor table). This is a non-visible column (as in the displayed
list uses the values from author column), and contains a 'normalized'
version of the name - which currently just means it is lower cased.
Because in this case, it doesn't start with an unaccented character, it
isn't showing up - what you need is for the diactritics to be replaced with
an unaccented equivalent when populating the sort_author column - this is
done in org/dspace/browse/Browse.java, in itemAdded() - around line 568.
The easiest way to do that at the moment, IMHO, would be to use the
ISOLatin1AccentFilter from Lucene:
<http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ISOLatin1AccentFilter.html#removeAccents(java.lang.String)>
So, that line 568 would change from:
row.setColumn("sort_author", value.toLowerCase());
to
row.setColumn("sort_author",
ISOLatin1AccentFilter.removeAccents(value.toLowerCase()));
I've attached a patch that will make the necessary changes to Browse.java -
this is generated against the current CVS HEAD.
BTW - I tried to do a search for Žekulin in your repository, and it didn't
find any results. The search box after submission shows:
Žekulin
has the UTF-8 encoding been set correctly in the Tomcat config:
(from <http://www.dspace.org/technology/system-docs/install.html> - note the
URIEncoding="UTF-8" attribute)
<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
<Connector port="8080"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true"
URIEncoding="UTF-8" />
G
----- Original Message -----
From: "Shawna Sadler" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, March 01, 2007 9:09 PM
Subject: [Dspace-tech] Diacritics & Indexing
Hi everyone,
I've run into a new problem, this author has a diacritic on the first
letter of his name: Žekulin, Nicholas G. and it's not being picked up by
the indexing feature.
https://dspace.ucalgary.ca/handle/1880/44267/browse-author
Any suggestions?
Shawna
--
Shawna Sadler
Coordinator, Digital Initiatives
Libraries & Cultural Resources
University of Calgary
Phone: (403) 220-3739
Email: [EMAIL PROTECTED]
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
This email has been scanned by Postini.
For more information please visit http://www.postini.com
### Eclipse Workspace Patch 1.0
#P dspace
Index: src/org/dspace/browse/Browse.java
===================================================================
RCS file: /cvsroot/dspace/dspace/src/org/dspace/browse/Browse.java,v
retrieving revision 1.47
diff -u -r1.47 Browse.java
--- src/org/dspace/browse/Browse.java 12 Sep 2006 11:22:13 -0000 1.47
+++ src/org/dspace/browse/Browse.java 1 Mar 2007 22:12:04 -0000
@@ -56,6 +56,7 @@
import java.util.WeakHashMap;
import org.apache.log4j.Logger;
+import org.apache.lucene.analysis.ISOLatin1AccentFilter;
import org.dspace.content.Collection;
import org.dspace.content.Community;
import org.dspace.content.DCValue;
@@ -562,9 +563,9 @@
else if ("ItemsByAuthor".equals(table))
{
// author name, and normalized sorting name
- // (which for now is simple lower-case)
+ // lower-case, and replace ISO Latin diacritics
row.setColumn("author", value);
- row.setColumn("sort_author", value.toLowerCase());
+ row.setColumn("sort_author",
ISOLatin1AccentFilter.removeAccents(value.toLowerCase()));
}
else if ("ItemsByTitle".equals(table))
{
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech