If you look in the Admin Interface on the screens where you create range indexes, there is a collation builder there. That can help you. Using the collation builder, it looks like this will be case and diacritic insensitive as well as space insensitive:
http://marklogic.com/collation/en/S1/T0020/AS Let’s check that in XQuery: xquery version "1.0-ml"; declare default collation "http://marklogic.com/collation/en/S1/T0020/AS"; "houston" eq "HOUSTON", "houston" eq "HOUSTON ", " houston" eq "HOUSTON " => true true true So that seems to work for space and caps. Here is the doc on collations: http://docs.marklogic.com/guide/search-dev/encodings_collations?hq=collation#id_40354 They are very powerful. As for the wildcards, I would guess that would work well. I would recommend the following indexes for wildcards: Word positions word searches three character word searches word positions word lexicon in the codepoint collation three character word positions See http://docs.marklogic.com/guide/search-dev/wildcard#id_14163 for more info. -Danny From: [email protected] [mailto:[email protected]] On Behalf Of Alexei Betin Sent: Thursday, January 08, 2015 5:27 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] function index in MarkLogic? So is there a way to create such case, whitespace and punctuation insensitive index? I tried using http://marklogic.com/collation/S1 as Geert suggested and that worked as far as capitalization goes (but it does not work e.g. for whitespace – ‘Houston ‘ would still be a miss). In addition, I had to specify the S1 collation at the app server level - ideally, I would want it to be at the field/index level. At any rate, handling capitalization/whitespace is only a small part of the story, other use cases include matching on a substring as in substring( ZipCode, 1, 3 ) = “921” (would wildcard search be faster than substring?) or a phonetic match as in spell:double-metaphone( substring-before( Email, '@' ) )[1] = “aptn” So for an ultimate solution, I am now looking at CPF/triggers as Justin suggested… Thanks! Alexei Betin Principal Architect; Big Data P: (817) 928-1643 | Elevate.com 4150 International Plaza, Suite 300 Fort Worth, TX 76109 Privileged and Confidential. This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain privileged and/or confidential information. If you have received this e-mail in error, please notify me immediately by a return e-mail and delete this e-mail. You are hereby notified that any dissemination, distribution or copying of this e-mail and/or any attachments thereto, is strictly prohibited. From: [email protected] [mailto:[email protected]] On Behalf Of Paul Hoehne Sent: Thursday, January 08, 2015 5:04 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] function index in MarkLogic? So one reason to do this, in the example given, is that (hypothetically) a city field in an address might have: HOUSTON, houston, Houston, HOuston, etc. Rather than using a computed field, you would use a case insensitive, maybe whitespace and punctuation insensitive index, so NEW YORK NEW YORK New YORK New York new york etc, would all be considered ‘equal.’ in terms of the index. While this doesn’t cover every instance where you might want a function index, it handles those cases where you’re worried about the case of the text, punctuation, or spaces. On Jan 8, 2015, at 2:47 PM, Alexei Betin <[email protected]> wrote: Hello, Does MarkLogic has something similar to “function index” as exists in other databases such as Oracle? For example, in Oracle one can create a function index on a value of UPPER(<field_name>) as follows: CREATE INDEX cities_fn_idx ON cities (UPPER(name)); which ensures that a query like the one below is efficient and not using the full table scan: SELECT name FROM cities WHERE UPPER(name) = 'HOUSTON'; I’d like to do something similar with MarkLogic which could help speed up, e.g. the following simple query (which currently runs prohibitively slow despite an index on /A/City): collection()/A/[upper-case( City ) = 'HOUSTON'] upper-case() is only an example, I am looking for a solution that would accommodate any function/transformation that derives a value to be used in a query condition from an existing element. Clearly, one solution would be to pre-compute the derived field, add it to all the documents in the collection, and create an index on the derived field – this is possible, but I want to make sure I am not missing a solution that does not require data modification (such as a “function index”). Thanks, <image005.jpg> <image006.jpg> Alexei Betin Principal Architect; Big Data P: (817) 928-1643 | Elevate.com 4150 International Plaza, Suite 300 Fort Worth, TX 76109 Privileged and Confidential. This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain privileged and/or confidential information. If you have received this e-mail in error, please notify me immediately by a return e-mail and delete this e-mail. You are hereby notified that any dissemination, distribution or copying of this e-mail and/or any attachments thereto, is strictly prohibited. _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
