The other avenue you can explore is using a range query 
(cts:element-range-query and cts:element-attribute-range-query).  These require 
range indexes on the supplied element or attribute, and they match according to 
collation order (for strings).  You might be able to get the semantics you are 
looking for by doing a < or a > (or one of the others)in the range-query 
operator.

Also, if you are using 3.2, the index advise I gave you is not applicable.  
Maybe that will be a good reason for you to upgrade to 4.1....

I am not sure what you mean about 1 or 2 char "lexicons".  Presumably you mean 
wildcards.  The lexicon queries are the ones that return values right out of 
the range indexes (cts:element-values, et al).  Like I said, in 4.1-3 and 
later, you should not need those as long as you have 3-char wildcard and a 
codepoint word lexicon enabled.

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Dominic Beesley
Sent: Wednesday, February 10, 2010 2:23 AM
To: 'General Mark Logic Developer Discussion'
Subject: RE: [MarkLogic Dev General] element-value-query with wildcards

Thanks to everyone who has replied, I can't believe I missed that last 
paragraph in the documentation.

I'll have a play with doing a word search instead of a value search as well. 
Though I would like the items to be in the right order... I could always filter 
the results afterwards.

We're still on ML 3.* but I'll follow the advice and knock back the indexing 
options - it was really just desperation!

What I'm not sure about is will I still need the one/two char lexicons to 
search for smaller items. This element can contain any old numbers in the 
string so could be for instance 1/89/EC - would that work better with "one" and 
"two character lexicons" to match with "1 89 *" ? In this application there 
will be hundreds of such searches a second across a large database, speed is 
likely more important than keeping the size down.

Also I'm not sure that the combination of the two parts of the search is 
optimal - is there a better way of specifying an element, with a particular 
attribute AND a particular value.

Thanks again for the help

Dom

From: [email protected] 
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: 09 February 2010 19:07
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] element-value-query with wildcards

Dom,

Doug's suggestion of changing the search string to match multiple words is a 
good one.  For example, consider the following query:

let $x := <a>123/456/abc</a>
return
(
cts:contains($x, cts:element-value-query(xs:QName("a"), "123* *",
               ("wildcarded"))),
cts:contains($x, cts:element-value-query(xs:QName("a"), "123*",
               ("wildcarded")))
)
=> true
     false

Another thing:  you mentioned that you turned on all wildcard indexes.  That is 
probably not your best bet.  I would recommend the following (assuming you are 
using 4.1-3 or greater) index settings for a good balance between good wildcard 
performance and database size:

stemmed searches
word searches
word positions
three character searches
three character word positions
word lexicon (unicode collation)

You can add other options as needed (such as case-sensitive and 
diacritic-sensitive), but for most content sets, you should not need the 2 and 
1 character searches or the trailing wildcards if you have these options 
enabled.

-Danny



From: [email protected] 
[mailto:[email protected]] On Behalf Of Glidden, Douglass 
A
Sent: Tuesday, February 09, 2010 10:27 AM
To: [email protected]
Subject: RE: [MarkLogic Dev General] element-value-query with wildcards

Dom,

I would guess that this issue has to do with word boundaries.  When you do an 
element-value-query for '139*', you are searching for only elements whose 
values contain a single word starting with '139' (because '*' does not match on 
word boundaries), but the slashes in your desired value are probably being 
interpreted as word boundaries.  There are a couple of options here, I think.  
One would be to change the search string in the element-value-query to '139* 
*'.  That should match one or more words with the first word starting with 
'139'.  Another option would be to keep the string the same but use an 
element-word-query, but that will also return positive for values like 
'890/1396/ZZ', where any "word" in the element starts with 139.

Doug Glidden
Software Engineer
The Boeing Company
[email protected]

________________________________
From: [email protected] 
[mailto:[email protected]] On Behalf Of Dominic Beesley
Sent: Tuesday, February 09, 2010 13:09
To: [email protected]
Subject: [MarkLogic Dev General] element-value-query with wildcards
Hello,

I've been trying to get the following search working:

    cts:and-query(
    (
        cts:element-attribute-value-query(xs:QName("xm:field"), 
xs:QName("type"), "number")
      , cts:element-value-query(xs:QName("xm:field"),('139*'),("wildcarded"))
     )

Search for <xm:field type='number'> element containing something starting with 
139. However I don't seem to be able to get this to work. It doesn't return all 
the values in my data.

It brings back:  "1398" but I know there is also a value "139/2004/EC" which 
I'd also like it to find.

If I change this to:

    cts:and-query(
    (
        cts:element-attribute-value-query(xs:QName("xm:field"), 
xs:QName("type"), "number")
      , 
cts:element-value-query(xs:QName("xm:field"),('139/2004/EC'),("wildcarded"))
     )
It works but I need to be able to find both at once if possible.

I've set all the wildcard options to true on the database and added a "element 
word lexicon" and reindexed. Will this not work? Do I need a different type of 
lexicon, range?

Any help gratefully received

Cheers

Dom
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to