Shawna, I found your author!
Take a look at:
https://dspace.ucalgary.ca/browse-author?top=Austad%2C+Michele+L
(you may need to 'next page' if any authors have been added after I sent
this)
All hail the multi-byte character ordering!!
Anyway, after testing your specific scenario, the ISO-Latin character
filter won't actually fix this particular issue (Ž isn't an ISO-Latin
character).
It could be fixed with the java.text.Normalizer - except that it only
exists in JDK 1.6 :-(. However, IBM offer basically the same thing, free
and open source, under the X public license.
If you download the icu4j-3_6.jar from here:
http://icu.sourceforge.net/download/3.6.html#ICU4J
(place in the lib directory) and apply the attached patch, this will
normalize diacritics as they are entered into your system.
The patch also includes a simple class that removes diacritics from
sort_author in the existing rows in the ItemsByAuthor table.
After patching, installing jar, running 'ant update', etc., from your
dspace installation 'bin' directory run:
dsrun org.dspace.browse.NormalizeSortAuthors
to 'fix' your existing entries.
G
On Thu, 2007-03-01 at 14:09 -0700, Shawna Sadler wrote:
> Hi everyone,
> I've run into a new problem, this author has a diacritic on the first
> letter of his name: Žekulin, Nicholas G. and it's not being picked up by
> the indexing feature.
> https://dspace.ucalgary.ca/handle/1880/44267/browse-author
>
> Any suggestions?
> Shawna
>
### Eclipse Workspace Patch 1.0
#P dspace
Index: src/org/dspace/browse/Browse.java
===================================================================
RCS file: /cvsroot/dspace/dspace/src/org/dspace/browse/Browse.java,v
retrieving revision 1.47
diff -u -r1.47 Browse.java
--- src/org/dspace/browse/Browse.java 12 Sep 2006 11:22:13 -0000 1.47
+++ src/org/dspace/browse/Browse.java 2 Mar 2007 10:38:14 -0000
@@ -67,6 +67,8 @@
import org.dspace.storage.rdbms.DatabaseManager;
import org.dspace.storage.rdbms.TableRow;
+import com.ibm.icu.text.Normalizer;
+
/**
* API for Browsing Items in DSpace by title, author, or date. Browses only
* return archived Items.
@@ -562,9 +564,13 @@
else if ("ItemsByAuthor".equals(table))
{
// author name, and normalized sorting name
- // (which for now is simple lower-case)
+ // lower case and replace diactritics
row.setColumn("author", value);
- row.setColumn("sort_author", value.toLowerCase());
+
+ String valueNormalized = Normalizer.normalize(value,
Normalizer.NFD)
+
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
+
+ row.setColumn("sort_author",
valueNormalized.toLowerCase());
}
else if ("ItemsByTitle".equals(table))
{
Index: src/org/dspace/browse/NormalizeSortAuthors.java
===================================================================
RCS file: src/org/dspace/browse/NormalizeSortAuthors.java
diff -N src/org/dspace/browse/NormalizeSortAuthors.java
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ src/org/dspace/browse/NormalizeSortAuthors.java 1 Jan 1970 00:00:00
-0000
@@ -0,0 +1,44 @@
+package org.dspace.browse;
+
+import org.dspace.core.Context;
+import org.dspace.storage.rdbms.DatabaseManager;
+import org.dspace.storage.rdbms.TableRow;
+import org.dspace.storage.rdbms.TableRowIterator;
+
+import com.ibm.icu.text.Normalizer;
+
+public class NormalizeSortAuthors
+{
+ /**
+ * @param args
+ */
+ public static void main(String[] args) throws Exception
+ {
+ Context c = new Context();
+
+ TableRowIterator authorIter = DatabaseManager.queryTable(c,
+ "ItemsByAuthor",
+ "SELECT * FROM ItemsByAuthor");
+
+ while (authorIter.hasNext())
+ {
+ TableRow authorRow = authorIter.next();
+ String sortAuthor = authorRow.getStringColumn("sort_author");
+ String sortAuthorNormalized;
+
+ // normalize the sort author
+ sortAuthorNormalized = Normalizer.normalize(sortAuthor,
Normalizer.NFD)
+
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
+
+ // if the sort author has been altered by the normalization,
update the database
+ if (!sortAuthor.equals(sortAuthorNormalized))
+ {
+ authorRow.setColumn("sort_author", sortAuthorNormalized);
+ DatabaseManager.update(c, authorRow);
+ }
+ }
+
+ authorIter.close();
+ c.commit();
+ }
+}
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech