My *guess* is not that Helen is against using distinct-values() but that
she's discovered, as all of us have,
that using "pure xquery" in ML on large datasets often doesn't optimize
as well as we like.
The solutions to this are various and will require more detail knowledge
about the exact makeup of the dataset,
such as how large, how many documents, if its fragmented or not, etc.

I've found generally that using cts:search() or related things is
required in order to get decent performance out of ML,
although sometimes (magically to me) I've stumbled on 'pure xquery' or
'pure xpath' expressions that happen to optimize well but I've never
found a good way to know for sure without deep analysis of the specific
query and lots of trial & error.

As for pre-processing.   Good question about performance against
updates.
This really comes down to how often are you updating vs how often do you
query ?
If your going to always need to create a tree version of the entire
dataset its worth doing that upfront (IMHO),
but if its occasional and you only need a tiny subset of the data turned
into a tree at any time, especially if your updates are frequent
compared to the queries then "on the fly" might be best.
And there is always hybrid approaches like using a "cache" of the
tree-formatted data and regenerating it only on first hit.
 



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Geert
Josten
Sent: Wednesday, April 07, 2010 9:48 AM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] how to build the tree based on data

Hi Helen,

Why can't you use distinct-values when using search? You can always
apply distinct-values on the result of cts:search, for whatever purpose.
But as you say that you are not interested in the unique values of
coden, you don't need to do both. Just write an Xpath expression to get
the coden elements with a particular value, or use a search to get to
the same. Then you take this subset and find all volumen within this
subset, and for each volumen you determine the issues again.

I think David was pretty close, but perhaps you are looking for
something more like this:

let $code := 'AAA'

let $selected-articles := //article[coden = $code]
return
<coden name="{$code}">  {
    let $volumes := fn:distinct-values( $selected-articles//volume )
    for $v in $volumes
    return
        <volume name="{$v}"> {
                for $i in $issues := $selected-articles[volume eq $v]
                return  <issue name="{$i}"/>
        } </volume>
} </coden>

Kind regards,
Geert

>


drs. G.P.H. (Geert) Josten
Consultant


Daidalos BV
Hoekeindsehof 1-4
2665 JZ Bleiswijk

T +31 (0)10 850 1200
F +31 (0)10 850 1199

mailto:[email protected]
http://www.daidalos.nl/

KvK 27164984

P Please consider the environment before printing this mail.
De informatie - verzonden in of met dit e-mailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.

> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Helen Chen
> Sent: woensdag 7 april 2010 15:34
> To: General Mark Logic Developer Discussion
> Cc: Helen Chen
> Subject: Re: [MarkLogic Dev General] how to build the tree
> based on data
>
> Hi Danny,
>
> I don't need to get unique coden list, hopefully this will
> make the data set smaller.
>
> What I need to do is for a specific coden, ( like AAA, of
> curse it has
> to be a variable ),  I need to build the tree for volume and issue.
> And I realize that I also need to specify the collection for
> my data set, I guess that maybe means I have to use the cts:search.
>
> But using search, I cannot use fn:distinct-values, then how
> the search can give me the unique volume list and unique
> issue list, I don't know how to make it work.
>
> Thanks, Helen
>
>
> On Apr 6, 2010, at 5:48 PM, Danny Sokolsky wrote:
>
> > If there is a range index on coden, then you can substitute the:
> >
> > fn:distinct-values(//coden)
> >
> > with
> >
> > cts:element-values(xs:QName("coden"))
> >
> > That should speed things up a bit....
> >
> > -Danny
> >
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]
> > ] On Behalf Of Lee, David
> > Sent: Tuesday, April 06, 2010 2:21 PM
> > To: General Mark Logic Developer Discussion
> > Subject: RE: [MarkLogic Dev General] how to build the tree based on
> > data
> >
> > How big is your data set ?
> > The example I gave is pure xquery  If the data is large
> there may not
> > be
> > much you can do entirely on-demand.   Making it go faster in ML may
> > involve pre-processing the data into a tree structure.
> I've had to do
> > the same thing myself from flat struture to generate fast trees.
> >
> >
> >
> >
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]] On Behalf Of Helen
> > Chen
> > Sent: Tuesday, April 06, 2010 4:53 PM
> > To: General Mark Logic Developer Discussion
> > Cc: Helen Chen
> > Subject: Re: [MarkLogic Dev General] how to build the tree based on
> > data
> >
> > Hi David,
> >
> > Thanks for the help. I did the list for one coden, this
> coden is kind
> > of a little bit big, and it took more than one minute to return the
> > full list, I added the order by for the for loop.
> >
> > It is a little too long for real time search result. I need it
> > quicker.
> >
> > Any suggestions?
> >
> > Thanks, Helen
> >
> >
> >
> >
> > On Apr 6, 2010, at 3:45 PM, Lee, David wrote:
> >
> >> Syntax typo
> >> Not:
> >> for $i in $issues := //article[volume eq $v and coden eq $c]
> >>
> >> Should be
> >>
> >> for $i in //article[volume eq $v and coden eq $c]
> >>
> >>
> >> Probably some others as well
> >>
> >>
> >> -----Original Message-----
> >> From: [email protected]
> >> [mailto:[email protected]] On Behalf Of Lee,
> >> David
> >> Sent: Tuesday, April 06, 2010 3:40 PM
> >> To: General Mark Logic Developer Discussion
> >> Subject: RE: [MarkLogic Dev General] how to build the tree
> based on
> >> data
> >>
> >> A very quick pseudocode .... maybe this will give you some ideas.
> >>
> >> let $codens := fn:distinct-values( //coden) for $c in
> $codens return
> >>    <coden name="{$c}">
> >>    {
> >>            let $volumes := fn:distinct-values( $c//volume )
> >>            for $v in $volumes
> >>            return
> >>            <volume name="{$v}">
> >>            {
> >>                    for $i in $issues := //article[volume
> eq $v and coden eq $c]
> >>                    return  <issue name="{$i}"/>
> >>            }
> >>            </volume>
> >>    }
> >>    </coden>
> >>
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: [email protected]
> >> [mailto:[email protected]] On Behalf
> Of Helen
> >> Chen
> >> Sent: Tuesday, April 06, 2010 2:29 PM
> >> To: General Mark Logic Developer Discussion
> >> Cc: Helen Chen
> >> Subject: [MarkLogic Dev General] how to build the tree
> based on data
> >>
> >> our data is like in the following structure
> >>
> <article>...<coden></coden><volume></volume><issue></issue><paper></
> >> paper>....</article>
> >>
> >> the following are examples of the data, each tag <article>
> means one
> >> xml in marklogic:
> >>
> >> article 1:
> >> <article>...<coden>AAA</coden><volume>1</volume><issue>1</
> >> issue><paper>123</paper>....</article>
> >> article 2:
> >> <article>....<coden>AAA</coden><volume>1</volume><issue>2</
> >> issue><paper>233</paper>....</article>
> >> article 3:
> >> <article>....<coden>AAA</coden><volume>2</volume><issue>3</
> >> issue><paper>355</paper>....</article>
> >> article 4:
> >> <article>....<coden>ACD</coden><volume>5</volume><issue>2</
> >> issue><paper>899</paper>....</article>
> >> article 5:
> >> <article>....<coden>ABC</coden><volume>1</volume><issue>3</
> >> issue><paper>667</paper>....</article>
> >>
> >>
> >> I want to build a tree list based on coden
> >>
> >> coden
> >>     volume
> >>        issue
> >>
> >>
> >> so based on the above example data, if I say I want to
> build the tree
> >> for coden AAA, I should get:
> >> AAA
> >>   vol 1
> >>     iss1
> >>     iss 2
> >>  vol 2
> >>      iss 3
> >>
> >> Any suggestions on how to build it using marklogic?
> >>
> >> Thanks, Helen
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://xqzone.com/mailman/listinfo/general
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://xqzone.com/mailman/listinfo/general
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://xqzone.com/mailman/listinfo/general
> >
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://xqzone.com/mailman/listinfo/general
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://xqzone.com/mailman/listinfo/general
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://xqzone.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to