Thanks Ellis, that's interesting I'll have look into it some more. At
first impression I don't think it will help with our next requirement
which is to do facets but again restricted by group. So if someone
searches a=foo AND b=bar, the facet of b would only include the value bar,
as the other values of b are not from a matching group. Right now I think
the only solution is to break the document into multiple fragments but
this gets quite complicated, I'm currently experimenting with
cts:document-fragment-query with limited success.
The document now looks like this, with fragment roots on content and
group.
<doc>
<content>blah</content>
<groups>
<group>
<a>foo</a>
<b>bar</b>
<c>fizz</c>
<d>buzz</d>
</group>
<group>
<a>foo</a>
<b>any value could have more than one word</b>
<c>fizz</c>
<d>buzz</d>
</group>
</groups>
</doc>
That lets us do queries like this, but I'm struggling to get the query to
work for both search and facet:
let $query := cts:and-query((
cts:document-fragment-query(cts:word-query("blah")),
(: including this 2nd document-fragment-query makes search work but
breaks the facet :)
cts:document-fragment-query(cts:and-query((
cts:element-range-query(xs:QName("a"), "=", "foo"),
cts:element-range-query(xs:QName("b"), "=", "bar")
)))
));
cts:search(/doc/content, $query, "unfiltered")
==> with document-fragment-query = 1 result per group match
cts:element-values(xs:QName("b"), (), (), $query)
==> without document-fragment-query = all values of b
On 03/09/2013 13:34, "Ellis Pritchard" <[email protected]> wrote:
>Hi Rob,
>
>This has been playing on my mind for some reason: could you achieve what
>you want using fields? It would be similar to your option 1 idea of
>flattening the pairs, but without the extra markup.
>
>(See the Fields section in "Inside MarkLogic Server" first)
>
>1/ Create a field "pair-a-b" on element <group> and exclude elements
><c>...<d> you don't want in that pair (and implicitly includes <a> and
><b>) and likewise create other fields for the other useful combinations.
>
>2/ Create field-range-indexes on field "pair-a-b" etc.
>
>("field value searches" also need to be enabled on the database).
>
>
>Then you can do e.g.
>
>cts:field-values('pair-a-b')
>
>which would bring back the valid co-occurances for a and b
>
>and
>
>cts:search(/doc,cts:field-query('pair-a-b','foo bar'))
>
>which would bring back documents containing a == foo and b == bar.
>
>
>It might not be sophisticated enough, or give problems with
>space-separated values, but a quick experiment seems to work for me in
>the simple case.
>
>Ellis.
>
>
>On 26 Aug 2013, at 19:09, "Whitby, Rob" <[email protected]> wrote:
>
>> Sorry, let me try that again.
>>
>> Some more thoughts on this problem. I have documents like this:
>>
>> <doc>
>> <groups>
>> <group>
>> <a>foo</a>
>> <b>bar</b>
>> <c>fizz</c>
>> <d>buzz</d>
>> </group>
>> <group>
>> <a>foo</a>
>> <b>any value could have more than one word</b>
>> <c>fizz</c>
>> <d>buzz</d>
>> </group>
>> </groups>
>> </doc>
>>
>> And I want to do co-occurrences on any pair of a,b,c,d but limited by
>> group.
>>
>> Using positions I can't be sure what the proximity should be for say a,d
>> because the number of words in each element varies. Too low will miss
>> matches, too high will get false-positives from the next group.
>>
>> Options
>>
>> 1. Flatten the co-occurrences by creating an element for each pair (so 6
>> combinations) with the 2 values concatenated.
>>
>> 2. I'm embarrassed to admit thinking it, but could get
>> postitions+proximity to work by inserting an element between the group
>> nodes with plenty of words in to act as a buffer :)
>>
>> 3. Any bright ideas?
>>
>>
>> Thanks
>> Rob
>>
>>
>>
>>
>>
>> On 26/08/2013 22:54, "Whitby, Rob" <[email protected]> wrote:
>>
>>> Some more thoughts on thisÅ
>>>
>>> <group>
>>> <a>a1</a>
>>> <a>a1</a>
>>>
>>> <a>a1</a>
>>>
>>> <a>a1</a>
>>>
>>>
>>>
>>>
>>> On 26/08/2013 20:13, "Whitby, Rob" <[email protected]> wrote:
>>>
>>>> Thanks Mike,
>>>>
>>>> I was thinking about using positions, and the ordered option is a nice
>>>> trick I didn't see. Unfortunately in reality there are other elements
>>>> inside author that we also need to do co-occurrences on so the
>>>>proximity
>>>> would have to be a guess on the max safe distance without hitting the
>>>> next
>>>> author. Might still be possible though, it would certainly be simpler
>>>> than
>>>> multiple fragment roots.
>>>>
>>>> I also thought of copying the authors into a separate document with
>>>>the
>>>> fragment root on author just for the co-occurrence calculations.
>>>>
>>>> What I really need is a "same parent element" option on
>>>>co-occurrences :)
>>>>
>>>> Cheers,
>>>> Rob
>>>>
>>>>
>>>> On 26/08/2013 19:46, "Michael Blakeley" <[email protected]> wrote:
>>>>
>>>>> Position lookups should work better than sub-fragments for those
>>>>> queries.
>>>>> To test this I believe you'll have to activate word-positions,
>>>>> element-word-positions, and element-value-positions in the database.
>>>>>
>>>>> For Q1 try wrapping the sibling terms in cts:element-query. I think
>>>>>that
>>>>> will work, but if it doesn't you should be able to use cts:near-query
>>>>> instead.
>>>>>
>>>>> For Q2, and add positional options to the
>>>>> cts:element-value-co-occurrences call. You'll probably want
>>>>> 'proximity=1', 'ordered', and always put 'country' before 'orgname'.
>>>>>
>>>>> If you want to stick with fragments for some other reason, that
>>>>> duplicate
>>>>> data idea is viable. Or
>>>>> http://docs.marklogic.com/cts:document-fragment-query might help.
>>>>>
>>>>> -- Mike
>>>>>
>>>>> On 26 Aug 2013, at 01:45 , "Whitby, Rob" <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I could use some help with fragment roots, I've never had to mess
>>>>>>with
>>>>>> them before.. here's a sample document:
>>>>>>
>>>>>> <doc>
>>>>>> <content/>
>>>>>> <authors>
>>>>>> <author>
>>>>>> <country>UK</country>
>>>>>> <orgname>A</orgname>
>>>>>> </author>
>>>>>> <author>
>>>>>> <country>UK</country>
>>>>>> <orgname>A</orgname>
>>>>>> </author>
>>>>>> <author>
>>>>>> <country>India</country>
>>>>>> <orgname>C</orgname>
>>>>>> </author>
>>>>>> </authors>
>>>>>> </doc>
>>>>>>
>>>>>> There are range indexes on country and orgname.
>>>>>>
>>>>>> I'm trying to configure fragment roots to enable 2 types of query:
>>>>>>
>>>>>> Q1. search doc nodes with range query on country.
>>>>>> cts:search(/doc, cts:element-range-query(xs:QName("country"), "=",
>>>>>> "UK"), "unfiltered")
>>>>>>
>>>>>> Q2. co-occurrences on country and orgname - in the same author.
>>>>>> cts:element-value-co-occurrences(xs:QName("country"),
>>>>>> xs:QName("orgname"))
>>>>>>
>>>>>> With no fragment roots defined:
>>>>>> Q1. Works as expected
>>>>>> Q2. doesn't restrict matches to within author node, so get pairs
>>>>>> (India,
>>>>>> A), (UK, C).
>>>>>>
>>>>>> With fragment roots defined on doc and author:
>>>>>> Q1. Returns 2 results - one for each author fragment
>>>>>> Q2. Works as expected
>>>>>>
>>>>>> Is there any way to make both these queries work unfiltered? Maybe I
>>>>>> could
>>>>>> duplicate the authors node but in a different namespace for the
>>>>>> fragment
>>>>>> root to do the co-occurrences? Or is there a better way without
>>>>>>using
>>>>>> fragment roots?
>>>>>>
>>>>>> Thanks
>>>>>> Rob
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> [email protected]
>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>
>>>
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general