Problem: we need to create a full-text search on a text that may include
various spellings of Hawaiian names. Properly spelled, many Hawaiian
place names include the "okina" or glottal stop. Technically it is
Unicode U+02BB but is often represented by a single curly quote, U+2018,
or just ASCII apostrophe. For example, the island of Oahu may be spelled
Oahu
O'ahu [apostrophe]
O‘ahu [curly quote, U+2018]
Oʻahu [okina, U+02BB]
Now suppose all of those spellings are found in our data, and we want to
implement a search that will match all of them when a user searches on
"oahu".
I can't think of any reasonable way to do this in MarkLogic.
cts:word-query("oahu",
('case-insensitive','diacritic-insensitive','punctuation-insensitive'))
matches only "Oahu". All the other spellings are tokenized on the
special characters and are therefore not matched.
Is there any obvious way to do this, short of duplicating the text with
spellings normalized?
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: [email protected] Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general