Sorry, let me try that again.
Some more thoughts on this problem. I have documents like this:
<doc>
<groups>
<group>
<a>foo</a>
<b>bar</b>
<c>fizz</c>
<d>buzz</d>
</group>
<group>
<a>foo</a>
<b>any value could have more than one word</b>
<c>fizz</c>
<d>buzz</d>
</group>
</groups>
</doc>
And I want to do co-occurrences on any pair of a,b,c,d but limited by
group.
Using positions I can't be sure what the proximity should be for say a,d
because the number of words in each element varies. Too low will miss
matches, too high will get false-positives from the next group.
Options
1. Flatten the co-occurrences by creating an element for each pair (so 6
combinations) with the 2 values concatenated.
2. I'm embarrassed to admit thinking it, but could get
postitions+proximity to work by inserting an element between the group
nodes with plenty of words in to act as a buffer :)
3. Any bright ideas?
Thanks
Rob
On 26/08/2013 22:54, "Whitby, Rob" <[email protected]> wrote:
>Some more thoughts on thisÅ
>
><group>
> <a>a1</a>
><a>a1</a>
>
><a>a1</a>
>
><a>a1</a>
>
>
>
>
>On 26/08/2013 20:13, "Whitby, Rob" <[email protected]> wrote:
>
>>Thanks Mike,
>>
>>I was thinking about using positions, and the ordered option is a nice
>>trick I didn't see. Unfortunately in reality there are other elements
>>inside author that we also need to do co-occurrences on so the proximity
>>would have to be a guess on the max safe distance without hitting the
>>next
>>author. Might still be possible though, it would certainly be simpler
>>than
>>multiple fragment roots.
>>
>>I also thought of copying the authors into a separate document with the
>>fragment root on author just for the co-occurrence calculations.
>>
>>What I really need is a "same parent element" option on co-occurrences :)
>>
>>Cheers,
>>Rob
>>
>>
>>On 26/08/2013 19:46, "Michael Blakeley" <[email protected]> wrote:
>>
>>>Position lookups should work better than sub-fragments for those
>>>queries.
>>>To test this I believe you'll have to activate word-positions,
>>>element-word-positions, and element-value-positions in the database.
>>>
>>>For Q1 try wrapping the sibling terms in cts:element-query. I think that
>>>will work, but if it doesn't you should be able to use cts:near-query
>>>instead.
>>>
>>>For Q2, and add positional options to the
>>>cts:element-value-co-occurrences call. You'll probably want
>>>'proximity=1', 'ordered', and always put 'country' before 'orgname'.
>>>
>>>If you want to stick with fragments for some other reason, that
>>>duplicate
>>>data idea is viable. Or
>>>http://docs.marklogic.com/cts:document-fragment-query might help.
>>>
>>>-- Mike
>>>
>>>On 26 Aug 2013, at 01:45 , "Whitby, Rob" <[email protected]>
>>>wrote:
>>>
>>>> I could use some help with fragment roots, I've never had to mess with
>>>> them before.. here's a sample document:
>>>>
>>>> <doc>
>>>> <content/>
>>>> <authors>
>>>> <author>
>>>> <country>UK</country>
>>>> <orgname>A</orgname>
>>>> </author>
>>>> <author>
>>>> <country>UK</country>
>>>> <orgname>A</orgname>
>>>> </author>
>>>> <author>
>>>> <country>India</country>
>>>> <orgname>C</orgname>
>>>> </author>
>>>> </authors>
>>>> </doc>
>>>>
>>>> There are range indexes on country and orgname.
>>>>
>>>> I'm trying to configure fragment roots to enable 2 types of query:
>>>>
>>>> Q1. search doc nodes with range query on country.
>>>> cts:search(/doc, cts:element-range-query(xs:QName("country"), "=",
>>>> "UK"), "unfiltered")
>>>>
>>>> Q2. co-occurrences on country and orgname - in the same author.
>>>> cts:element-value-co-occurrences(xs:QName("country"),
>>>> xs:QName("orgname"))
>>>>
>>>> With no fragment roots defined:
>>>> Q1. Works as expected
>>>> Q2. doesn't restrict matches to within author node, so get pairs
>>>>(India,
>>>> A), (UK, C).
>>>>
>>>> With fragment roots defined on doc and author:
>>>> Q1. Returns 2 results - one for each author fragment
>>>> Q2. Works as expected
>>>>
>>>> Is there any way to make both these queries work unfiltered? Maybe I
>>>>could
>>>> duplicate the authors node but in a different namespace for the
>>>>fragment
>>>> root to do the co-occurrences? Or is there a better way without using
>>>> fragment roots?
>>>>
>>>> Thanks
>>>> Rob
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>
>>>
>>>_______________________________________________
>>>General mailing list
>>>[email protected]
>>>http://developer.marklogic.com/mailman/listinfo/general
>>>
>>
>>
>>_______________________________________________
>>General mailing list
>>[email protected]
>>http://developer.marklogic.com/mailman/listinfo/general
>>
>
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general