Re: [MarkLogic Dev General] root collation vs unicode collation in terms of performance

Mary Holstege Tue, 23 Aug 2016 10:20:51 -0700

On Tue, 23 Aug 2016 08:46:40 -0700, Tim Meagher <t...@aaom.net> wrote:


> Just wondering why MarkLogic does not make codepoint the default  
> collation
> if it results in a 10% performance improvement.
>
>
> Tim

Let's not confuse the default appserver collation with the collation you  
might want to use on a range index or word lexicon.
There is no default for a range index or word lexicon: you need to pick  
when you configure them and you should pick what gives you the proper  
balance of functionality and performance.

The 10% faster stat was a measurement of running through the entire range  
index comparing every value, and it was made some time ago. It may have  
shifted a bit because we've done various work optimizing collations and  
various lexicon operations.  There are, however, cases where in practice  
the root collation is faster because it has smaller ranges of values to  
look at.  For example, if you are doing a case-insensitive  
diacritic-insensitive comparison using a codepoint collation word lexicon,  
since the variants can be widely separated in codepoint order and there  
are theoretical variants in the exciting reaches of Unicode that you have  
make sure you look for, you end up looking at a lot of needless cruft that  
is all sorted continguously in the root collation. So, the general rule of  
performance still applies: measure, because it is never what you think.  
Performance stats here are highly data and operation dependent.

The other thing to keep in mind is that the appserver default collation is  
what is used for basic comparisons and order by in your modules, and the  
codepoint ordering makes no sense to normal humans, who don't want to see  
deYoung before Darwin when they sort names, just because the codepoints  
for uppercase letters come first.

//Mary

>
>
> From: general-boun...@developer.marklogic.com
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Yalaverthi,
> Sudheer (LNG-RDU)
> Sent: Tuesday, August 23, 2016 11:27 AM
> To: MarkLogic Developer Discussion <general@developer.marklogic.com>
> Subject: Re: [MarkLogic Dev General] root collation vs unicode collation  
> in
> terms of performance
>
>
> Hi,
>
>
> If anyone can share their experiences or knowledge in terms of which one
> works better in terms of performance, it will be very helpful.
>
>
> Thanks.
>
>
> -Sudheer
>
>
> From: general-boun...@developer.marklogic.com
> <mailto:general-boun...@developer.marklogic.com>
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Yalaverthi,
> Sudheer (LNG-RDU)
> Sent: Monday, August 22, 2016 2:31 PM
> To: MarkLogic Developer Discussion <general@developer.marklogic.com
> <mailto:general@developer.marklogic.com> >
> Subject: [MarkLogic Dev General] root collation vs unicode collation in
> terms of performance
>
>
> Hi,
>
>
>
> In one of older developer community threads here
> <http://developer.marklogic.com/pipermail/general/2012-March/009981.html>  
> ,
> I have found this statement from Mary.
>
>
> "If you are not collapsing values, the codepoint collation
>
> is generally about 10% faster in its operations."
>
>
>
> We have few elements for which we need range indexes but these elements  
> do
> not have any diacritic sensitive information and they just store GUIDs or
> similar sort of values. I was initially thinking of using root collation
> indexes for this. But after reading the above thread, it made me wonder  
> if I
> have to be using codepoint collation for better performance. Since these
> elements do not have diacritic sensitive information anyway, I wonder if
> root collation performance will be in par with codepoint.
>
>
> Let me know which one is better in this scenario.
>
>
>
> Thanks,
>
> Sudheer
>
>


-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] root collation vs unicode collation in terms of performance

Reply via email to