Re: RE: [MarkLogic Dev General] Unicode flattening for noncombined characters

David Sewell Mon, 21 Jul 2008 11:18:37 -0700

The only problem with relying on collations for normalization is that
there are always going to be use-case exceptions. For example, in the S1
collation below, "encyclopædia" (with a ligature "ae") equals
"encyclopaedia", but does not equal "encyclopedia". But your use case is
likely to be that when a user enters "encyclopedia" you want to match
spellings with any of e/ae/æ. There may be no way around a thesaurus
approach for issues like this.


Danny Sokolosky wrote:

> Ian's collation idea is an excellent one. The case-insensitive,
> diacritic-insensitive option will work I think:

> default collation = "http://marklogic.com/collation//S1";

> "Ø" eq "O", fn:lower-case("Ø") eq "O"

> returns: true true

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [EMAIL PROTECTED]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Re: RE: [MarkLogic Dev General] Unicode flattening for noncombined characters

Reply via email to