Re: [MarkLogic Dev General] function index in MarkLogic?

Danny Sokolsky Thu, 08 Jan 2015 17:48:10 -0800

If you look in the Admin Interface on the screens where you create range 
indexes, there is a collation builder there.  That can help you.  Using the 
collation builder, it looks like this will be case and diacritic insensitive as 
well as space insensitive:



http://marklogic.com/collation/en/S1/T0020/AS

Let’s check that in XQuery:

xquery version "1.0-ml";

declare default collation "http://marklogic.com/collation/en/S1/T0020/AS";;

"houston" eq "HOUSTON",
"houston" eq "HOUSTON ",
"   houston" eq "HOUSTON  "

=>
true 
true
true

So that seems to work for space and caps.

Here is the doc on collations:

http://docs.marklogic.com/guide/search-dev/encodings_collations?hq=collation#id_40354

They are very powerful.

As for the wildcards, I would guess that would work well.  I would recommend 
the following indexes for wildcards:

Word positions

word searches
three character word searches
word positions
word lexicon in the codepoint collation
three character word positions

See http://docs.marklogic.com/guide/search-dev/wildcard#id_14163 for more info.

-Danny


From: [email protected] 
[mailto:[email protected]] On Behalf Of Alexei Betin
Sent: Thursday, January 08, 2015 5:27 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] function index in MarkLogic?

So is there a way to create such case, whitespace and punctuation insensitive 
index? I tried using http://marklogic.com/collation/S1 as Geert suggested and 
that worked as far as capitalization goes (but it does not work e.g. for 
whitespace – ‘Houston ‘ would still be a miss). 

In addition, I had to specify the S1 collation at the app server level - 
ideally, I would want it to be at the field/index level. 

At any rate, handling capitalization/whitespace is only a small part of the 
story, other use cases include matching on a substring as in
substring( ZipCode, 1, 3 ) = “921”
(would wildcard search be faster than substring?) or a phonetic match as in
spell:double-metaphone( substring-before( Email, '@' )  )[1] = “aptn”

So for an ultimate solution, I am now looking at CPF/triggers as Justin 
suggested…

Thanks!





Alexei Betin

Principal Architect; Big Data
P: (817) 928-1643 | Elevate.com
4150 International Plaza, Suite 300
Fort Worth, TX 76109

Privileged and Confidential. This e-mail, and any attachments thereto, is 
intended only for use by the addressee(s) named herein and may contain 
privileged and/or confidential information. If you have received this e-mail in 
error, please notify me immediately by a return e-mail and delete this e-mail. 
You are hereby notified that any dissemination, distribution or copying of this 
e-mail and/or any attachments thereto, is strictly prohibited. 

From: [email protected] 
[mailto:[email protected]] On Behalf Of Paul Hoehne
Sent: Thursday, January 08, 2015 5:04 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] function index in MarkLogic?

So one reason to do this, in the example given, is that (hypothetically) a city 
field in an address might have:

HOUSTON, houston, Houston, HOuston, etc.

Rather than using a computed field, you would use a case insensitive, maybe 
whitespace and punctuation insensitive index, so
NEW YORK
NEW   YORK
New YORK
New York
new york
etc, would all be considered ‘equal.’ in terms of the index.  

While this doesn’t cover every instance where you might want a function index, 
it handles those cases where you’re worried about the case of the text, 
punctuation, or spaces.  


On Jan 8, 2015, at 2:47 PM, Alexei Betin <[email protected]> wrote:

Hello,
 
Does MarkLogic has something similar to “function index” as exists in other 
databases such as Oracle? For example, in Oracle one can create a function 
index on a value of UPPER(<field_name>) as follows:

CREATE INDEX cities_fn_idx ON cities (UPPER(name));
which ensures that a query like the one below is efficient and not using the 
full table scan:
SELECT name FROM cities WHERE UPPER(name) = 'HOUSTON';
 
I’d like to do something similar with MarkLogic which could help speed up, e.g. 
the following simple query (which currently runs prohibitively slow despite an 
index on /A/City):
 
collection()/A/[upper-case( City ) = 'HOUSTON']
 
upper-case() is only an example, I am looking for a solution that would 
accommodate any function/transformation that derives a value to be used in a 
query condition from an existing element.
 
Clearly, one solution would be to pre-compute the derived field, add it to all 
the documents in the collection, and create an index on the derived field – 
this is possible, but I want to make sure I am not missing a solution that does 
not require data modification (such as a “function index”).
 
Thanks,
 
<image005.jpg>
<image006.jpg>


Alexei Betin

Principal Architect; Big Data
P: (817) 928-1643 | Elevate.com
4150 International Plaza, Suite 300
Fort Worth, TX 76109
 
Privileged and Confidential. This e-mail, and any attachments thereto, is 
intended only for use by the addressee(s) named herein and may contain 
privileged and/or confidential information. If you have received this e-mail in 
error, please notify me immediately by a return e-mail and delete this e-mail. 
You are hereby notified that any dissemination, distribution or copying of this 
e-mail and/or any attachments thereto, is strictly prohibited. 
 
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] function index in MarkLogic?

Reply via email to