Re: [MarkLogic Dev General] querying with multiple fragment roots

Whitby, Rob Tue, 03 Sep 2013 06:29:25 -0700

Thanks Ellis, that's interesting I'll have look into it some more. At
first impression I don't think it will help with our next requirement
which is to do facets but again restricted by group. So if someone
searches a=foo AND b=bar, the facet of b would only include the value bar,
as the other values of b are not from a matching group. Right now I think
the only solution is to break the document into multiple fragments but
this gets quite complicated, I'm currently experimenting with
cts:document-fragment-query with limited success.


The document now looks like this, with fragment roots on content and
group. 

<doc>

  <content>blah</content>
  <groups>
    <group>
      <a>foo</a>
      <b>bar</b>
      <c>fizz</c>
      <d>buzz</d>
    </group>
    <group>
      <a>foo</a>
      <b>any value could have more than one word</b>
      <c>fizz</c>
      <d>buzz</d>
    </group>
  </groups>
</doc>


That lets us do queries like this, but I'm struggling to get the query to
work for both search and facet:


let $query := cts:and-query((
  cts:document-fragment-query(cts:word-query("blah")),
  (: including this 2nd document-fragment-query makes search work but
breaks the facet :)
  cts:document-fragment-query(cts:and-query((
    cts:element-range-query(xs:QName("a"), "=", "foo"),
    cts:element-range-query(xs:QName("b"), "=", "bar")

  )))
));

cts:search(/doc/content, $query, "unfiltered")
==> with document-fragment-query = 1 result per group match

cts:element-values(xs:QName("b"), (), (), $query)
==> without document-fragment-query = all values of b



On 03/09/2013 13:34, "Ellis Pritchard" <[email protected]> wrote:

>Hi Rob,
>
>This has been playing on my mind for some reason: could you achieve what
>you want using fields? It would be similar to your option 1 idea of
>flattening the pairs, but without the extra markup.
>
>(See the Fields section in "Inside MarkLogic Server" first)
>
>1/ Create a field "pair-a-b" on element <group> and exclude elements
><c>...<d> you don't want in that pair (and implicitly includes <a> and
><b>) and likewise create other fields for the other useful combinations.
>
>2/ Create field-range-indexes on field "pair-a-b" etc.
>
>("field value searches" also need to be enabled on the database).
>
>
>Then you can do e.g.
>
>cts:field-values('pair-a-b')
>
>which would bring back the valid co-occurances for a and b
>
>and
>
>cts:search(/doc,cts:field-query('pair-a-b','foo bar'))
>
>which would bring back documents containing a == foo and b == bar.
>
>
>It might not be sophisticated enough, or give problems with
>space-separated values, but a quick experiment seems to work for me in
>the simple case.
>
>Ellis.
>
>
>On 26 Aug 2013, at 19:09, "Whitby, Rob" <[email protected]> wrote:
>
>> Sorry, let me try that again.
>> 
>> Some more thoughts on this problem. I have documents like this:
>> 
>> <doc>
>>  <groups>
>>    <group>
>>      <a>foo</a>
>>      <b>bar</b>
>>      <c>fizz</c>
>>      <d>buzz</d>
>>    </group>
>>    <group>
>>      <a>foo</a>
>>      <b>any value could have more than one word</b>
>>      <c>fizz</c>
>>      <d>buzz</d>
>>    </group>
>>  </groups>
>> </doc>
>> 
>> And I want to do co-occurrences on any pair of a,b,c,d but limited by
>> group.
>> 
>> Using positions I can't be sure what the proximity should be for say a,d
>> because the number of words in each element varies. Too low will miss
>> matches, too high will get false-positives from the next group.
>> 
>> Options
>> 
>> 1. Flatten the co-occurrences by creating an element for each pair (so 6
>> combinations) with the 2 values concatenated.
>> 
>> 2. I'm embarrassed to admit thinking it, but could get
>> postitions+proximity to work by inserting an element between the group
>> nodes with plenty of words in to act as a buffer :)
>> 
>> 3. Any bright ideas?
>> 
>> 
>> Thanks
>> Rob
>> 
>> 
>> 
>> 
>> 
>> On 26/08/2013 22:54, "Whitby, Rob" <[email protected]> wrote:
>> 
>>> Some more thoughts on thisŠ
>>> 
>>> <group>
>>> <a>a1</a>
>>> <a>a1</a>
>>> 
>>> <a>a1</a>
>>> 
>>> <a>a1</a>
>>> 
>>> 
>>> 
>>> 
>>> On 26/08/2013 20:13, "Whitby, Rob" <[email protected]> wrote:
>>> 
>>>> Thanks Mike,
>>>> 
>>>> I was thinking about using positions, and the ordered option is a nice
>>>> trick I didn't see. Unfortunately in reality there are other elements
>>>> inside author that we also need to do co-occurrences on so the
>>>>proximity
>>>> would have to be a guess on the max safe distance without hitting the
>>>> next
>>>> author. Might still be possible though, it would certainly be simpler
>>>> than
>>>> multiple fragment roots.
>>>> 
>>>> I also thought of copying the authors into a separate document with
>>>>the
>>>> fragment root on author just for the co-occurrence calculations.
>>>> 
>>>> What I really need is a "same parent element" option on
>>>>co-occurrences :)
>>>> 
>>>> Cheers,
>>>> Rob
>>>> 
>>>> 
>>>> On 26/08/2013 19:46, "Michael Blakeley" <[email protected]> wrote:
>>>> 
>>>>> Position lookups should work better than sub-fragments for those
>>>>> queries.
>>>>> To test this I believe you'll have to activate word-positions,
>>>>> element-word-positions, and element-value-positions in the database.
>>>>> 
>>>>> For Q1 try wrapping the sibling terms in cts:element-query. I think
>>>>>that
>>>>> will work, but if it doesn't you should be able to use cts:near-query
>>>>> instead.
>>>>> 
>>>>> For Q2, and add positional options to the
>>>>> cts:element-value-co-occurrences call. You'll probably want
>>>>> 'proximity=1', 'ordered', and always put 'country' before 'orgname'.
>>>>> 
>>>>> If you want to stick with fragments for some other reason, that
>>>>> duplicate
>>>>> data idea is viable. Or
>>>>> http://docs.marklogic.com/cts:document-fragment-query might help.
>>>>> 
>>>>> -- Mike
>>>>> 
>>>>> On 26 Aug 2013, at 01:45 , "Whitby, Rob" <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> I could use some help with fragment roots, I've never had to mess
>>>>>>with
>>>>>> them before.. here's a sample document:
>>>>>> 
>>>>>> <doc>
>>>>>> <content/> 
>>>>>> <authors>
>>>>>>   <author>
>>>>>>     <country>UK</country>
>>>>>>     <orgname>A</orgname>
>>>>>>   </author>
>>>>>>   <author>
>>>>>>     <country>UK</country>
>>>>>>     <orgname>A</orgname>
>>>>>>   </author>
>>>>>>   <author>
>>>>>>     <country>India</country>
>>>>>>     <orgname>C</orgname>
>>>>>>   </author>
>>>>>> </authors>
>>>>>> </doc>
>>>>>> 
>>>>>> There are range indexes on country and orgname.
>>>>>> 
>>>>>> I'm trying to configure fragment roots to enable 2 types of query:
>>>>>> 
>>>>>> Q1. search doc nodes with range query on country.
>>>>>>   cts:search(/doc, cts:element-range-query(xs:QName("country"), "=",
>>>>>> "UK"), "unfiltered")
>>>>>> 
>>>>>> Q2. co-occurrences on country and orgname - in the same author.
>>>>>>   cts:element-value-co-occurrences(xs:QName("country"),
>>>>>> xs:QName("orgname"))
>>>>>> 
>>>>>> With no fragment roots defined:
>>>>>> Q1. Works as expected
>>>>>> Q2. doesn't restrict matches to within author node, so get pairs
>>>>>> (India,
>>>>>> A), (UK, C).
>>>>>> 
>>>>>> With fragment roots defined on doc and author:
>>>>>> Q1. Returns 2 results - one for each author fragment
>>>>>> Q2. Works as expected
>>>>>> 
>>>>>> Is there any way to make both these queries work unfiltered? Maybe I
>>>>>> could
>>>>>> duplicate the authors node but in a different namespace for the
>>>>>> fragment
>>>>>> root to do the co-occurrences? Or is there a better way without
>>>>>>using
>>>>>> fragment roots?
>>>>>> 
>>>>>> Thanks
>>>>>> Rob
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> [email protected]
>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general
>


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] querying with multiple fragment roots

Reply via email to