Sorry, let me try that again.

Some more thoughts on this problem. I have documents like this:

<doc>
  <groups>
    <group>
      <a>foo</a>
      <b>bar</b>
      <c>fizz</c>
      <d>buzz</d>
    </group>
    <group>
      <a>foo</a>
      <b>any value could have more than one word</b>
      <c>fizz</c>
      <d>buzz</d>
    </group>
  </groups>
</doc>

And I want to do co-occurrences on any pair of a,b,c,d but limited by
group.

Using positions I can't be sure what the proximity should be for say a,d
because the number of words in each element varies. Too low will miss
matches, too high will get false-positives from the next group.

Options

1. Flatten the co-occurrences by creating an element for each pair (so 6
combinations) with the 2 values concatenated.

2. I'm embarrassed to admit thinking it, but could get
postitions+proximity to work by inserting an element between the group
nodes with plenty of words in to act as a buffer :)

3. Any bright ideas?


Thanks
Rob





On 26/08/2013 22:54, "Whitby, Rob" <[email protected]> wrote:

>Some more thoughts on thisÅ 
>
><group>
>  <a>a1</a>
><a>a1</a>
>
><a>a1</a>
>
><a>a1</a>
>
>
>
>
>On 26/08/2013 20:13, "Whitby, Rob" <[email protected]> wrote:
>
>>Thanks Mike,
>>
>>I was thinking about using positions, and the ordered option is a nice
>>trick I didn't see. Unfortunately in reality there are other elements
>>inside author that we also need to do co-occurrences on so the proximity
>>would have to be a guess on the max safe distance without hitting the
>>next
>>author. Might still be possible though, it would certainly be simpler
>>than
>>multiple fragment roots.
>>
>>I also thought of copying the authors into a separate document with the
>>fragment root on author just for the co-occurrence calculations.
>>
>>What I really need is a "same parent element" option on co-occurrences :)
>>
>>Cheers,
>>Rob
>>
>>
>>On 26/08/2013 19:46, "Michael Blakeley" <[email protected]> wrote:
>>
>>>Position lookups should work better than sub-fragments for those
>>>queries.
>>>To test this I believe you'll have to activate word-positions,
>>>element-word-positions, and element-value-positions in the database.
>>>
>>>For Q1 try wrapping the sibling terms in cts:element-query. I think that
>>>will work, but if it doesn't you should be able to use cts:near-query
>>>instead.
>>>
>>>For Q2, and add positional options to the
>>>cts:element-value-co-occurrences call. You'll probably want
>>>'proximity=1', 'ordered', and always put 'country' before 'orgname'.
>>>
>>>If you want to stick with fragments for some other reason, that
>>>duplicate
>>>data idea is viable. Or
>>>http://docs.marklogic.com/cts:document-fragment-query might help.
>>>
>>>-- Mike
>>>
>>>On 26 Aug 2013, at 01:45 , "Whitby, Rob" <[email protected]>
>>>wrote:
>>>
>>>> I could use some help with fragment roots, I've never had to mess with
>>>> them before.. here's a sample document:
>>>> 
>>>> <doc>
>>>>  <content/> 
>>>>  <authors>
>>>>    <author>
>>>>      <country>UK</country>
>>>>      <orgname>A</orgname>
>>>>    </author>
>>>>    <author>
>>>>      <country>UK</country>
>>>>      <orgname>A</orgname>
>>>>    </author>
>>>>    <author>
>>>>      <country>India</country>
>>>>      <orgname>C</orgname>
>>>>    </author>
>>>>  </authors>
>>>> </doc>
>>>> 
>>>> There are range indexes on country and orgname.
>>>> 
>>>> I'm trying to configure fragment roots to enable 2 types of query:
>>>> 
>>>> Q1. search doc nodes with range query on country.
>>>>    cts:search(/doc, cts:element-range-query(xs:QName("country"), "=",
>>>> "UK"), "unfiltered")
>>>> 
>>>> Q2. co-occurrences on country and orgname - in the same author.
>>>>    cts:element-value-co-occurrences(xs:QName("country"),
>>>> xs:QName("orgname"))
>>>> 
>>>> With no fragment roots defined:
>>>> Q1. Works as expected
>>>> Q2. doesn't restrict matches to within author node, so get pairs
>>>>(India,
>>>> A), (UK, C).
>>>> 
>>>> With fragment roots defined on doc and author:
>>>> Q1. Returns 2 results - one for each author fragment
>>>> Q2. Works as expected
>>>> 
>>>> Is there any way to make both these queries work unfiltered? Maybe I
>>>>could
>>>> duplicate the authors node but in a different namespace for the
>>>>fragment
>>>> root to do the co-occurrences? Or is there a better way without using
>>>> fragment roots?
>>>> 
>>>> Thanks
>>>> Rob
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>>
>>>_______________________________________________
>>>General mailing list
>>>[email protected]
>>>http://developer.marklogic.com/mailman/listinfo/general
>>>
>>
>>
>>_______________________________________________
>>General mailing list
>>[email protected]
>>http://developer.marklogic.com/mailman/listinfo/general
>>
>
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general
>


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to