Hi David,
Unfortunately there's not yet an "okina-insensitive" query option. :)
If you run a punctuation-insensitive search for the phrase "O ahu" it will
match your three spellings (as well as O-ahu or O.ahu). It'll be good to have
the fast phrases index enabled.
To eliminate the (probably rare) spurious punctuation hit you could do a
cts:or-query of the various spellings and set the query terms to
punctuation-sensitive. Then MarkLogic will use indexes to match the phrase "O
ahu" and internal filtering to verify the punctuation is correct. Since it'll
be pretty rare to have any other punctuation in there, your filter hit ratio
will be quite high so performance will remain good.
You can use a custom Hawaiian word thesaurus to take care of the cts:or-query
expansion.
-jh-
On Aug 26, 2010, at 7:23 PM, David Sewell wrote:
> Problem: we need to create a full-text search on a text that may include
> various spellings of Hawaiian names. Properly spelled, many Hawaiian
> place names include the "okina" or glottal stop. Technically it is
> Unicode U+02BB but is often represented by a single curly quote, U+2018,
> or just ASCII apostrophe. For example, the island of Oahu may be spelled
>
> Oahu
> O'ahu [apostrophe]
> O‘ahu [curly quote, U+2018]
> Oʻahu [okina, U+02BB]
>
> Now suppose all of those spellings are found in our data, and we want to
> implement a search that will match all of them when a user searches on
> "oahu".
>
> I can't think of any reasonable way to do this in MarkLogic.
>
> cts:word-query("oahu",
> ('case-insensitive','diacritic-insensitive','punctuation-insensitive'))
>
> matches only "Oahu". All the other spellings are tokenized on the
> special characters and are therefore not matched.
>
> Is there any obvious way to do this, short of duplicating the text with
> spellings normalized?
>
> --
> David Sewell, Editorial and Technical Manager
> ROTUNDA, The University of Virginia Press
> PO Box 400314, Charlottesville, VA 22904-4314 USA
> Email: [email protected] Tel: +1 434 924 9973
> Web:
> http://rotunda.upress.virginia.edu/_______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general