Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

sai shanker Mon, 26 Mar 2012 06:23:34 -0700

hi,
you can loop across each document, grab all the child nodes and put them in a 
map.
Thanks and Regards,
Sai.

From: Ryan Dew <[email protected]>
To: MarkLogic Developer Discussion <[email protected]> 
Sent: Monday, March 26, 2012 9:14 AM
Subject: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL



You could try a recursive function like the following. No guarantee it is 100% 
right, if you have sub elements that have the same names as your root elements. 

xquery version "1.0-ml";

declare function local:find-unique-qnames($found-qnames as xs:QName*) {
  let $next-qname := cts:search(collection()/*, 
    if (exists($found-qnames))
    then cts:not-query(cts:element-query($found-qnames,cts:and-query(())))
    else cts:and-query(())
  )[1]/node-name(.)
  return if (exists($next-qname))
          then local:find-unique-qnames(($found-qnames,$next-qname))
          else $found-qnames
};

declare function local:find-unique-qnames() {
  for $qn in local:find-unique-qnames(())
  order by string($qn)
  return $qn
};

local:find-unique-qnames()

On Mon, Mar 26, 2012 at 6:36 AM, Geert Josten <[email protected]> wrote:

Hi Vishnu,
> 
>It would help if you could explain why you need that list. But in general the 
>best option would be to pre-calculate the list I guess. You can save it as a 
>server-field (xdmp:set-server-field), to keep the list in memory on each host. 
>But you would need an algorithm to initialize it, and each doc commit would 
>have to check and update that list. The latter can be done with a post-commit 
>trigger. The first can be done best by the strategy I already mentioned: 
>divide all docs in chunks of 100 to 1000 docs, calculate distinct names of 
>each chunk, and merge that somehow to the final list.
> 
>You could also raise the tree size setting temporarily to do that initial 
>calculation..
> 
>Kind regards,
>Geert
> 
>Van:[email protected] 
>[mailto:[email protected]] Namens VISH RAJPUT
>Verzonden: maandag 26 maart 2012 14:29
>
>Aan: MarkLogic Developer Discussion
>Onderwerp: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
> 
>Thanks Geert,
> 
>Is there any alternate solution to find the unique elements within a database?
> 
>Warm Regards,
>Vishnu
> 
> 
>On Mon, Mar 26, 2012 at 5:55 PM, Geert Josten <[email protected]> wrote:
>Hi Vishnu,
> 
>90 mb isn’t much indeed, but MarkLogic is configured to keep a low memory 
>footprint, even if there are 30 concurrent requests. To make that sure, the 
>tree size limit (look at the database setting in the admin interface) is 
>usually pretty low. I have 8Gb and still it is set to no more than 85mb by 
>default. But you can increase it if you like.
> 
>A more streaming approach like my advice attempts to achieve to some extend 
>helps keeping the footprint low, and keep MarkLogic fast.
> 
>Kind regards,
>Geert
> 
>Van:[email protected] 
>[mailto:[email protected]] Namens VISH RAJPUT
>Verzonden: maandag 26 maart 2012 14:17
>Aan: MarkLogic Developer Discussion
>Onderwerp: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
> 
>Thanks Geert,
> 
>But still it 
>shows XDMP-EXPNTREECACHEFULL: distinct-values(collection("ContentAnalysis")//*/local-name()) --
> Expanded tree cache full on host.... the database overall size is only 90MB i 
>don't think it is so huge data for marklogic....
> 
> 
>Regards,
>Vishnu
> 
>On Mon, Mar 26, 2012 at 1:25 PM, Geert Josten <[email protected]> wrote:
>Hi Vishnu,
> 
>Your FLWOR expression won’t return distinct names, since you are applying the 
>function to each individual name. You should write:
> 
>distinct-values(
>    for $a in //*
>    return $a
>)
> 
>Or better:
> 
>distinct-values(collection()//*/local-name())
> 
>But this still might not perform well, or still max out on list or tree 
>caches. This approach is creating a complete list of all element names first, 
>and starts applying distinct-values only thereafter. You might consider taking 
>multiple steps, like per doc first, and then clustering per 100 files, and 
>only then all clusters. You could also just take 100 random samples, and use 
>that. That doesn’t guarantee a 100% complete list, but it remains performant 
>even if your database grows 10 or 100 fold.
> 
>Kind regards,
>Geert
> 
>Van:[email protected] 
>[mailto:[email protected]] Namens VISH RAJPUT
>Verzonden: maandag 26 maart 2012 8:29
>Aan: [email protected]
>Onderwerp: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
> 
>The size of the all files is 90 MB approx.
>---------- Forwarded message ----------
>From: VISH RAJPUT <[email protected]>
>Date: Mon, Mar 26, 2012 at 11:56 AM
>Subject: [1.0-ml] XDMP-EXPNTREECACHEFULL
>To: [email protected]
>
>
>Hi,
> 
>I have 2000 files in Marklogic database within a single forest and i want to 
>find out the unique element name from this database for the whole 2000 files. 
>For this i wrote the below query:-
> 
>for $a in //*
>return distinct-values($a/local-name()))
> 
>but by this i got an error "[1.0-ml] XDMP-EXPNTREECACHEFULL"  what should i do?
> 
> 
>Regards,
>Vishnu Singh
> 
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general
> 
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general
> 
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general
>
>

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

Reply via email to