The only problem with relying on collations for normalization is that
there are always going to be use-case exceptions. For example, in the S1
collation below, "encyclopædia" (with a ligature "ae") equals
"encyclopaedia", but does not equal "encyclopedia". But your use case is
likely to be that when a user enters "encyclopedia" you want to match
spellings with any of e/ae/æ. There may be no way around a thesaurus
approach for issues like this.
Danny Sokolosky wrote:
> Ian's collation idea is an excellent one. The case-insensitive,
> diacritic-insensitive option will work I think:
> default collation = "http://marklogic.com/collation//S1"
> "Ø" eq "O", fn:lower-case("Ø") eq "O"
> returns: true true
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [EMAIL PROTECTED] Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general