Shawna, I found your author!

Take a look at:

https://dspace.ucalgary.ca/browse-author?top=Austad%2C+Michele+L

(you may need to 'next page' if any authors have been added after I sent
this)

All hail the multi-byte character ordering!!

Anyway, after testing your specific scenario, the ISO-Latin character
filter won't actually fix this particular issue (Ž isn't an ISO-Latin
character).

It could be fixed with the java.text.Normalizer - except that it only
exists in JDK 1.6 :-(. However, IBM offer basically the same thing, free
and open source, under the X public license.

If you download the icu4j-3_6.jar from here:
http://icu.sourceforge.net/download/3.6.html#ICU4J

(place in the lib directory) and apply the attached patch, this will
normalize diacritics as they are entered into your system.

The patch also includes a simple class that removes diacritics from
sort_author in the existing rows in the ItemsByAuthor table.

After patching, installing jar, running 'ant update', etc., from your
dspace installation 'bin' directory run:

dsrun org.dspace.browse.NormalizeSortAuthors

to 'fix' your existing entries.

G

On Thu, 2007-03-01 at 14:09 -0700, Shawna Sadler wrote:
> Hi everyone,
> I've run into a new problem, this author has a diacritic on the first 
> letter of his name: Žekulin, Nicholas G. and it's not being picked up by 
> the indexing feature.
> https://dspace.ucalgary.ca/handle/1880/44267/browse-author
> 
> Any suggestions?
> Shawna
> 
### Eclipse Workspace Patch 1.0
#P dspace
Index: src/org/dspace/browse/Browse.java
===================================================================
RCS file: /cvsroot/dspace/dspace/src/org/dspace/browse/Browse.java,v
retrieving revision 1.47
diff -u -r1.47 Browse.java
--- src/org/dspace/browse/Browse.java   12 Sep 2006 11:22:13 -0000      1.47
+++ src/org/dspace/browse/Browse.java   2 Mar 2007 10:38:14 -0000
@@ -67,6 +67,8 @@
 import org.dspace.storage.rdbms.DatabaseManager;
 import org.dspace.storage.rdbms.TableRow;
 
+import com.ibm.icu.text.Normalizer;
+
 /**
  * API for Browsing Items in DSpace by title, author, or date. Browses only
  * return archived Items.
@@ -562,9 +564,13 @@
                 else if ("ItemsByAuthor".equals(table))
                 {
                     // author name, and normalized sorting name
-                    // (which for now is simple lower-case)
+                    // lower case and replace diactritics
                     row.setColumn("author", value);
-                    row.setColumn("sort_author", value.toLowerCase());
+
+                    String valueNormalized = Normalizer.normalize(value, 
Normalizer.NFD)
+                                    
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
+                    
+                    row.setColumn("sort_author", 
valueNormalized.toLowerCase());
                 }
                 else if ("ItemsByTitle".equals(table))
                 {
Index: src/org/dspace/browse/NormalizeSortAuthors.java
===================================================================
RCS file: src/org/dspace/browse/NormalizeSortAuthors.java
diff -N src/org/dspace/browse/NormalizeSortAuthors.java
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ src/org/dspace/browse/NormalizeSortAuthors.java     1 Jan 1970 00:00:00 
-0000
@@ -0,0 +1,44 @@
+package org.dspace.browse;
+
+import org.dspace.core.Context;
+import org.dspace.storage.rdbms.DatabaseManager;
+import org.dspace.storage.rdbms.TableRow;
+import org.dspace.storage.rdbms.TableRowIterator;
+
+import com.ibm.icu.text.Normalizer;
+
+public class NormalizeSortAuthors
+{
+    /**
+     * @param args
+     */
+    public static void main(String[] args) throws Exception
+    {
+        Context c = new Context();
+        
+        TableRowIterator authorIter =  DatabaseManager.queryTable(c,
+                                            "ItemsByAuthor",
+                                            "SELECT * FROM ItemsByAuthor");
+        
+        while (authorIter.hasNext())
+        {
+            TableRow authorRow = authorIter.next();
+            String sortAuthor = authorRow.getStringColumn("sort_author");
+            String sortAuthorNormalized;
+            
+            // normalize the sort author
+            sortAuthorNormalized = Normalizer.normalize(sortAuthor, 
Normalizer.NFD)
+                                        
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
+            
+            // if the sort author has been altered by the normalization, 
update the database
+            if (!sortAuthor.equals(sortAuthorNormalized))
+            {
+                authorRow.setColumn("sort_author", sortAuthorNormalized);
+                DatabaseManager.update(c, authorRow);
+            }
+        }
+
+        authorIter.close();
+        c.commit();
+    }
+}
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to