It’s a bit off-topic but you can do that easily with BaseX:
Create a DB, load all documents, than:
index:element-names("DB_NAME")
Szabolcs
From: [email protected]
[mailto:[email protected]] On Behalf Of Brent Hartwig
Sent: 26 March 2012 14:45
To: sai shanker; MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
Curious there aren’t functions like this tapping into the universal index.
-Brent
From: [email protected]
[mailto:[email protected]] On Behalf Of sai shanker
Sent: Monday, March 26, 2012 9:23 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
hi,
you can loop across each document, grab all the child nodes and put them in a
map.
Thanks and Regards,
Sai.
From: Ryan Dew <[email protected]>
To: MarkLogic Developer Discussion <[email protected]>
Sent: Monday, March 26, 2012 9:14 AM
Subject: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
You could try a recursive function like the following. No guarantee it is 100%
right, if you have sub elements that have the same names as your root elements.
xquery version "1.0-ml";
declare function local:find-unique-qnames($found-qnames as xs:QName*) {
let $next-qname := cts:search(collection()/*,
if (exists($found-qnames))
then cts:not-query(cts:element-query($found-qnames,cts:and-query(())))
else cts:and-query(())
)[1]/node-name(.)
return if (exists($next-qname))
then local:find-unique-qnames(($found-qnames,$next-qname))
else $found-qnames
};
declare function local:find-unique-qnames() {
for $qn in local:find-unique-qnames(())
order by string($qn)
return $qn
};
local:find-unique-qnames()
On Mon, Mar 26, 2012 at 6:36 AM, Geert Josten <[email protected]> wrote:
Hi Vishnu,
It would help if you could explain why you need that list. But in general the
best option would be to pre-calculate the list I guess. You can save it as a
server-field (xdmp:set-server-field), to keep the list in memory on each host.
But you would need an algorithm to initialize it, and each doc commit would
have to check and update that list. The latter can be done with a post-commit
trigger. The first can be done best by the strategy I already mentioned: divide
all docs in chunks of 100 to 1000 docs, calculate distinct names of each chunk,
and merge that somehow to the final list.
You could also raise the tree size setting temporarily to do that initial
calculation..
Kind regards,
Geert
Van: [email protected]
[mailto:[email protected]] Namens VISH RAJPUT
Verzonden: maandag 26 maart 2012 14:29
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
Thanks Geert,
Is there any alternate solution to find the unique elements within a database?
Warm Regards,
Vishnu
On Mon, Mar 26, 2012 at 5:55 PM, Geert Josten <[email protected]> wrote:
Hi Vishnu,
90 mb isn’t much indeed, but MarkLogic is configured to keep a low memory
footprint, even if there are 30 concurrent requests. To make that sure, the
tree size limit (look at the database setting in the admin interface) is
usually pretty low. I have 8Gb and still it is set to no more than 85mb by
default. But you can increase it if you like.
A more streaming approach like my advice attempts to achieve to some extend
helps keeping the footprint low, and keep MarkLogic fast.
Kind regards,
Geert
Van: [email protected]
[mailto:[email protected]] Namens VISH RAJPUT
Verzonden: maandag 26 maart 2012 14:17
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
Thanks Geert,
But still it shows XDMP-EXPNTREECACHEFULL:
distinct-values(collection("ContentAnalysis")//*/local-name()) -- Expanded tree
cache full on host.... the database overall size is only 90MB i don't think it
is so huge data for marklogic....
Regards,
Vishnu
On Mon, Mar 26, 2012 at 1:25 PM, Geert Josten <[email protected]> wrote:
Hi Vishnu,
Your FLWOR expression won’t return distinct names, since you are applying the
function to each individual name. You should write:
distinct-values(
for $a in //*
return $a
)
Or better:
distinct-values(collection()//*/local-name())
But this still might not perform well, or still max out on list or tree caches.
This approach is creating a complete list of all element names first, and
starts applying distinct-values only thereafter. You might consider taking
multiple steps, like per doc first, and then clustering per 100 files, and only
then all clusters. You could also just take 100 random samples, and use that.
That doesn’t guarantee a 100% complete list, but it remains performant even if
your database grows 10 or 100 fold.
Kind regards,
Geert
Van: [email protected]
[mailto:[email protected]] Namens VISH RAJPUT
Verzonden: maandag 26 maart 2012 8:29
Aan: [email protected]
Onderwerp: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
The size of the all files is 90 MB approx.
---------- Forwarded message ----------
From: VISH RAJPUT <[email protected]>
Date: Mon, Mar 26, 2012 at 11:56 AM
Subject: [1.0-ml] XDMP-EXPNTREECACHEFULL
To: [email protected]
Hi,
I have 2000 files in Marklogic database within a single forest and i want to
find out the unique element name from this database for the whole 2000 files.
For this i wrote the below query:-
for $a in //*
return distinct-values($a/local-name()))
but by this i got an error "[1.0-ml] XDMP-EXPNTREECACHEFULL" what should i do?
Regards,
Vishnu Singh
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general