On Wed, 30 Mar 2016 05:42:02 -0700, Blessing N <[email protected]> wrote: ... > 1.Alphabetize letter by letter from A to Z. > 2.Ignore the capitalization of letters > 3.Ignore mathematical symbols and any special characters that do not > include a Latin letter > 4.Ignore punctuation > 5. Do not ignore spaces. Spaces follow the rule of "nothing precedes > something" > > For example the title "Do we go?" comes before the title "Does it > count?",because the space after "Do" comes before any letter attached to > the end of this string (in this case, the "es" in "Does"). > > I managed to achieve all except #5 by creating a field with the following > collation(I'm using search:search) > > http://marklogic.com/collation/en/S1/AS/T0000/NO > ... > > Is there a collation where I could define to not ignore spaces while > ignoring punctuation and symbols? >
I don't think you can get there. What you have is probably as good as you can get, but it will also ignore space. Deconstructing this for onlookers: (See also http://userguide.icu-project.org/collation/architecture) en = Use special English rules (mainly around handling of ae ligatures) S1 = collapse case and diacritic variants AS = shift variable characters, i.e. throw away variable characters from the sort keys T0000 = set variable top so all variable characters are shifted NO = turn on normalization. You probably don't need this, actually. The collations are based on the Unicode collation algorithm (http://unicode.org/reports/tr10/) as implemented by ICU. The T parameter (variable top) gives the cutoff in the default Unicode collation element table (DUCET, see here for latest version: http://www.unicode.org/Public/UCA/latest/allkeys.txt) up to which variable characters will be either shifted (AS) or treated as level 4 (AN). Variable characters are things like whitespace and punctuation plus other random weird stuff. But all the whitespace codepoints are before the symbol codepoints in the ordering, and there isn't a "variable bottom" parameter. //Mary _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
