Hi Rob,

This has been playing on my mind for some reason: could you achieve what you 
want using fields? It would be similar to your option 1 idea of flattening the 
pairs, but without the extra markup.

(See the Fields section in "Inside MarkLogic Server" first)

1/ Create a field "pair-a-b" on element <group> and exclude elements <c>...<d> 
you don't want in that pair (and implicitly includes <a> and <b>) and likewise 
create other fields for the other useful combinations.

2/ Create field-range-indexes on field "pair-a-b" etc.

("field value searches" also need to be enabled on the database).


Then you can do e.g.

cts:field-values('pair-a-b')

which would bring back the valid co-occurances for a and b

and

cts:search(/doc,cts:field-query('pair-a-b','foo bar'))

which would bring back documents containing a == foo and b == bar.


It might not be sophisticated enough, or give problems with space-separated 
values, but a quick experiment seems to work for me in the simple case.

Ellis.


On 26 Aug 2013, at 19:09, "Whitby, Rob" <[email protected]> wrote:

> Sorry, let me try that again.
> 
> Some more thoughts on this problem. I have documents like this:
> 
> <doc>
>  <groups>
>    <group>
>      <a>foo</a>
>      <b>bar</b>
>      <c>fizz</c>
>      <d>buzz</d>
>    </group>
>    <group>
>      <a>foo</a>
>      <b>any value could have more than one word</b>
>      <c>fizz</c>
>      <d>buzz</d>
>    </group>
>  </groups>
> </doc>
> 
> And I want to do co-occurrences on any pair of a,b,c,d but limited by
> group.
> 
> Using positions I can't be sure what the proximity should be for say a,d
> because the number of words in each element varies. Too low will miss
> matches, too high will get false-positives from the next group.
> 
> Options
> 
> 1. Flatten the co-occurrences by creating an element for each pair (so 6
> combinations) with the 2 values concatenated.
> 
> 2. I'm embarrassed to admit thinking it, but could get
> postitions+proximity to work by inserting an element between the group
> nodes with plenty of words in to act as a buffer :)
> 
> 3. Any bright ideas?
> 
> 
> Thanks
> Rob
> 
> 
> 
> 
> 
> On 26/08/2013 22:54, "Whitby, Rob" <[email protected]> wrote:
> 
>> Some more thoughts on thisÅ 
>> 
>> <group>
>> <a>a1</a>
>> <a>a1</a>
>> 
>> <a>a1</a>
>> 
>> <a>a1</a>
>> 
>> 
>> 
>> 
>> On 26/08/2013 20:13, "Whitby, Rob" <[email protected]> wrote:
>> 
>>> Thanks Mike,
>>> 
>>> I was thinking about using positions, and the ordered option is a nice
>>> trick I didn't see. Unfortunately in reality there are other elements
>>> inside author that we also need to do co-occurrences on so the proximity
>>> would have to be a guess on the max safe distance without hitting the
>>> next
>>> author. Might still be possible though, it would certainly be simpler
>>> than
>>> multiple fragment roots.
>>> 
>>> I also thought of copying the authors into a separate document with the
>>> fragment root on author just for the co-occurrence calculations.
>>> 
>>> What I really need is a "same parent element" option on co-occurrences :)
>>> 
>>> Cheers,
>>> Rob
>>> 
>>> 
>>> On 26/08/2013 19:46, "Michael Blakeley" <[email protected]> wrote:
>>> 
>>>> Position lookups should work better than sub-fragments for those
>>>> queries.
>>>> To test this I believe you'll have to activate word-positions,
>>>> element-word-positions, and element-value-positions in the database.
>>>> 
>>>> For Q1 try wrapping the sibling terms in cts:element-query. I think that
>>>> will work, but if it doesn't you should be able to use cts:near-query
>>>> instead.
>>>> 
>>>> For Q2, and add positional options to the
>>>> cts:element-value-co-occurrences call. You'll probably want
>>>> 'proximity=1', 'ordered', and always put 'country' before 'orgname'.
>>>> 
>>>> If you want to stick with fragments for some other reason, that
>>>> duplicate
>>>> data idea is viable. Or
>>>> http://docs.marklogic.com/cts:document-fragment-query might help.
>>>> 
>>>> -- Mike
>>>> 
>>>> On 26 Aug 2013, at 01:45 , "Whitby, Rob" <[email protected]>
>>>> wrote:
>>>> 
>>>>> I could use some help with fragment roots, I've never had to mess with
>>>>> them before.. here's a sample document:
>>>>> 
>>>>> <doc>
>>>>> <content/> 
>>>>> <authors>
>>>>>   <author>
>>>>>     <country>UK</country>
>>>>>     <orgname>A</orgname>
>>>>>   </author>
>>>>>   <author>
>>>>>     <country>UK</country>
>>>>>     <orgname>A</orgname>
>>>>>   </author>
>>>>>   <author>
>>>>>     <country>India</country>
>>>>>     <orgname>C</orgname>
>>>>>   </author>
>>>>> </authors>
>>>>> </doc>
>>>>> 
>>>>> There are range indexes on country and orgname.
>>>>> 
>>>>> I'm trying to configure fragment roots to enable 2 types of query:
>>>>> 
>>>>> Q1. search doc nodes with range query on country.
>>>>>   cts:search(/doc, cts:element-range-query(xs:QName("country"), "=",
>>>>> "UK"), "unfiltered")
>>>>> 
>>>>> Q2. co-occurrences on country and orgname - in the same author.
>>>>>   cts:element-value-co-occurrences(xs:QName("country"),
>>>>> xs:QName("orgname"))
>>>>> 
>>>>> With no fragment roots defined:
>>>>> Q1. Works as expected
>>>>> Q2. doesn't restrict matches to within author node, so get pairs
>>>>> (India,
>>>>> A), (UK, C).
>>>>> 
>>>>> With fragment roots defined on doc and author:
>>>>> Q1. Returns 2 results - one for each author fragment
>>>>> Q2. Works as expected
>>>>> 
>>>>> Is there any way to make both these queries work unfiltered? Maybe I
>>>>> could
>>>>> duplicate the authors node but in a different namespace for the
>>>>> fragment
>>>>> root to do the co-occurrences? Or is there a better way without using
>>>>> fragment roots?
>>>>> 
>>>>> Thanks
>>>>> Rob
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to