Re: [MarkLogic Dev General] function index in MarkLogic?

Michael Blakeley Thu, 08 Jan 2015 18:01:07 -0800

Specifying a collation at the app-server doesn't fully solve the problem. If a 
collation is the right tool for the job, it needs to be part of the index and 
again as part of the query. It's important to understand the difference between 
specifying a collation in a range index, in a FLWOR expression, in the module 
scope, and in the app-server scope: 
https://docs.marklogic.com/guide/search-dev/encodings_collations touches on 
this.


Collations are a powerful tool for handling equality tests with or without 
case-sensitivity, whitespace-sensitivity, etc. If you can't get the job done 
with a collation, add the value you want as a new element. If necessary, add a 
range index on that new element. Personally I avoid using CPF or triggers if at 
all possible, but that's a finer point.

If you need substring matching, use wildcards or cts:value-match.

-- Mike

> On 8 Jan 2015, at 17:27 , Alexei Betin <[email protected]> wrote:
> 
> So is there a way to create such case, whitespace and punctuation insensitive 
> index? I tried usinghttp://marklogic.com/collation/S1 as Geert suggested and 
> that worked as far as capitalization goes (but it does not work e.g. for 
> whitespace – ‘Houston ‘ would still be a miss).
>  
> In addition, I had to specify the S1 collation at the app server level - 
> ideally, I would want it to be at the field/index level.
>  
> At any rate, handling capitalization/whitespace is only a small part of the 
> story, other use cases include matching on a substring as in
> substring( ZipCode, 1, 3 ) = “921”
> (would wildcard search be faster than substring?) or a phonetic match as in
> spell:double-metaphone( substring-before( Email, '@' )  )[1] = “aptn”
>  
> So for an ultimate solution, I am now looking at CPF/triggers as Justin 
> suggested…
>  
> Thanks!
>  
> <image001.gif>
> <image002.gif>
> Alexei Betin
> Principal Architect; Big Data
> P: (817) 928-1643 | Elevate.com
> 4150 International Plaza, Suite 300
> Fort Worth, TX 76109
>  
> Privileged and Confidential. This e-mail, and any attachments thereto, is 
> intended only for use by the addressee(s) named herein and may contain 
> privileged and/or confidential information. If you have received this e-mail 
> in error, please notify me immediately by a return e-mail and delete this 
> e-mail. You are hereby notified that any dissemination, distribution or 
> copying of this e-mail and/or any attachments thereto, is strictly 
> prohibited. 
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Paul Hoehne
> Sent: Thursday, January 08, 2015 5:04 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] function index in MarkLogic?
>  
> So one reason to do this, in the example given, is that (hypothetically) a 
> city field in an address might have:
>  
> HOUSTON, houston, Houston, HOuston, etc.
>  
> Rather than using a computed field, you would use a case insensitive, maybe 
> whitespace and punctuation insensitive index, so
> NEW YORK
> NEW   YORK
> New YORK
> New York
> new york
> etc, would all be considered ‘equal.’ in terms of the index.  
>  
> While this doesn’t cover every instance where you might want a function 
> index, it handles those cases where you’re worried about the case of the 
> text, punctuation, or spaces.  
>  
>  
> On Jan 8, 2015, at 2:47 PM, Alexei Betin <[email protected]> wrote:
>  
> Hello,
>  
> Does MarkLogic has something similar to “function index” as exists in other 
> databases such as Oracle? For example, in Oracle one can create a function 
> index on a value of UPPER(<field_name>) as follows:
> 
> CREATE INDEX cities_fn_idx ON cities (UPPER(name));
> which ensures that a query like the one below is efficient and not using the 
> full table scan:
> SELECT name FROM cities WHERE UPPER(name) = 'HOUSTON';
>  
> I’d like to do something similar with MarkLogic which could help speed up, 
> e.g. the following simple query (which currently runs prohibitively slow 
> despite an index on /A/City):
>  
> collection()/A/[upper-case( City ) = 'HOUSTON']
>  
> upper-case() is only an example, I am looking for a solution that would 
> accommodate any function/transformation that derives a value to be used in a 
> query condition from an existing element.
>  
> Clearly, one solution would be to pre-compute the derived field, add it to 
> all the documents in the collection, and create an index on the derived field 
> – this is possible, but I want to make sure I am not missing a solution that 
> does not require data modification (such as a “function index”).
>  
> Thanks,
>  
> <image005.jpg>
> <image006.jpg>
> Alexei Betin
> Principal Architect; Big Data
> P: (817) 928-1643 | Elevate.com
> 4150 International Plaza, Suite 300
> Fort Worth, TX 76109
>  
> Privileged and Confidential. This e-mail, and any attachments thereto, is 
> intended only for use by the addressee(s) named herein and may contain 
> privileged and/or confidential information. If you have received this e-mail 
> in error, please notify me immediately by a return e-mail and delete this 
> e-mail. You are hereby notified that any dissemination, distribution or 
> copying of this e-mail and/or any attachments thereto, is strictly 
> prohibited. 
>  
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>  
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] function index in MarkLogic?

Reply via email to