On Wed, 30 Mar 2016 05:42:02 -0700, Blessing N <[email protected]>  
wrote:
...
> 1.Alphabetize letter by letter from A to Z.
> 2.Ignore the capitalization of letters
> 3.Ignore mathematical symbols and any special characters that do not  
> include a Latin letter
> 4.Ignore punctuation
> 5. Do not ignore spaces. Spaces follow the rule of "nothing precedes  
> something"
>
> For example the title "Do we go?" comes before the title "Does it  
> count?",because the space after "Do" comes before any letter attached to  
> the end of this string (in this case, the "es" in "Does").
>
> I managed to achieve all except #5 by creating a field with the following
> collation(I'm using search:search)
>
> http://marklogic.com/collation/en/S1/AS/T0000/NO
>
...
>
> Is there a collation where I could define to not ignore spaces while  
> ignoring punctuation and symbols?
>

I don't think you can get there.  What you have is probably as good as you  
can get, but it will also ignore space.
Deconstructing this for onlookers: (See also  
http://userguide.icu-project.org/collation/architecture)
en = Use special English rules (mainly around handling of ae ligatures)
S1 = collapse case and diacritic variants
AS = shift variable characters, i.e. throw away variable characters from  
the sort keys
T0000 = set variable top so all variable characters are shifted
NO = turn on normalization. You probably don't need this, actually.

The collations are based on the Unicode collation algorithm  
(http://unicode.org/reports/tr10/) as implemented by ICU.

The T parameter (variable top) gives the cutoff in the default Unicode  
collation element table (DUCET, see here for latest version:  
http://www.unicode.org/Public/UCA/latest/allkeys.txt) up to which variable  
characters will be either shifted (AS) or treated as level 4 (AN).  
Variable characters are things like whitespace and punctuation plus other  
random weird stuff. But all the whitespace codepoints are before the  
symbol codepoints in the ordering, and there isn't a "variable bottom"  
parameter.

//Mary


_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to