Re: [MarkLogic Dev General] root collation vs unicode collation in terms of performance

Christopher Hamlin Tue, 23 Aug 2016 09:01:52 -0700

The old email from Mary explains more than I would know.

My takeaway, take it for what it's worth, is:


Use the correct collation for your 'text' indexes; use codepoint for
'value' indexes like UIDs and such.

You won't be worrying about whether your UID has diacritics, or whether it
sorts according to German or French rules (I'm guessing).

My experience is that codepoint is faster, but it will be a question of
what you do and what your indexes are.

If the indexes are small, or you only ever get the first value, it may not
matter (unless they grow . . .).

If you have huge indexes across many nodes and do things that require
sorting/unique-ifying then it can matter.

For something like a UUID, for example, it sounds like codepoint is the way
to go.

A little discussion here:

https://docs.marklogic.com/guide/search-dev/encodings_collations#id_70034

If you have data, you can test for your situation to see.


On Tue, Aug 23, 2016 at 11:46 AM, Tim Meagher <t...@aaom.net> wrote:

> Just wondering why MarkLogic does not make codepoint the default collation
> if it results in a 10% performance improvement…
>
>
>
> Tim
>
>
>
> *From:* general-boun...@developer.marklogic.com [mailto:general-bounces@
> developer.marklogic.com] *On Behalf Of *Yalaverthi, Sudheer (LNG-RDU)
> *Sent:* Tuesday, August 23, 2016 11:27 AM
> *To:* MarkLogic Developer Discussion <general@developer.marklogic.com>
> *Subject:* Re: [MarkLogic Dev General] root collation vs unicode
> collation in terms of performance
>
>
>
> Hi,
>
>
>
> If anyone can share their experiences or knowledge in terms of which one
> works better in terms of performance, it will be very helpful.
>
>
>
> Thanks.
>
>
>
> -Sudheer
>
>
>
> *From:* general-boun...@developer.marklogic.com [mailto:general-bounces@
> developer.marklogic.com <general-boun...@developer.marklogic.com>] *On
> Behalf Of *Yalaverthi, Sudheer (LNG-RDU)
> *Sent:* Monday, August 22, 2016 2:31 PM
> *To:* MarkLogic Developer Discussion <general@developer.marklogic.com>
> *Subject:* [MarkLogic Dev General] root collation vs unicode collation in
> terms of performance
>
>
>
> Hi,
>
>
>
>
>
> In one of older developer community threads here
> <http://developer.marklogic.com/pipermail/general/2012-March/009981.html>,
> I have found this statement from Mary.
>
>
>
> “If you are not collapsing values, the codepoint collation
>
> is generally about 10% faster in its operations.”
>
>
>
>
>
> We have few elements for which we need range indexes but these elements do
> not have any diacritic sensitive information and they just store GUIDs or
> similar sort of values. I was initially thinking of using root collation
> indexes for this. But after reading the above thread, it made me wonder if
> I have to be using codepoint collation for better performance. Since these
> elements do not have diacritic sensitive information anyway, I wonder if
> root collation performance will be in par with codepoint.
>
>
>
> Let me know which one is better in this scenario.
>
>
>
>
>
> Thanks,
>
> Sudheer
>
>
>
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] root collation vs unicode collation in terms of performance

Reply via email to