HI Mike. Thanks for the reply. Yeah, was surprised that cts:triples was not as efficient as I had hoped. Adding the index works, but just feels odd.
I really don't have a true use-case for needing the number - just in testing/developing I found it odd that I could not get the answer. Estimating via a sample is OK - now that I know the true number. I had been estimating using the count of sem:triple on a random set of 100 docs and the end always ended up at about 48 million - off by 14% - but on these biggish numbers, it still gives me a ballpark figure. Regards, David David EnnisContent Engineer[image: HintTech Mastering the value of content]<http://www.hinttech.com>Mastering the value of contentcreative | technology | contentDelftechpark 37i2628 XJ DelftThe NetherlandsT:+31 88 268 25 00M:+31 6 000 000 00[image: Website]<http://www.hinttech.com>[image: Twitter] <https://twitter.com/HintTech>[image: Facebook]<http://www.facebook.com/HintTech>[image: LinkedIn] <http://www.linkedin.com/company/HintTech> [image: HintTech Mastering the value of content] <http://www.dayon.nl> On 22 February 2014 16:30, Michael Blakeley <[email protected]> wrote: > It seems like this should be possible in SPARQL, but I think a SPARQL > doesn't have COUNT yet? When that's implemented it might also make sense to > add an XQuery accessor, maybe something like cts:remainder. Another > approach might be to make xdmp:estimate accurate for triples. > > The fact that count(doc()//sem:triple) is faster than count(cts:triples()) > may be a bug, or at least a missing optimization. If it's an important > use-case for you, contact support. > > If you don't mind a little imprecision you can sample. This assumes the > count of triples in the first triple document is representative of the rest > of the database. > > count((//sem:triple)[1]/root()//sem:triple) > * xdmp:estimate(//sem:triple) > > Of course you could sample more documents rather than just the first one, > and adjust accordingly. > > -- Mike > > On 21 Feb 2014, at 23:04 , David Ennis <[email protected]> wrote: > > > Howdy. > > > > In trying to learn the details of the Triple Store in MarkLogic, I > decided to keep kicking it until it dies. To really stress it, I am using a > 1 CPU setup with 2 gig of memory and have loaded in ~42 million triples. > It grumbled a bit in the process, but succeeded and the graph endpoint on > the rest interface is happy enough for some tesing.. > > > > But... I am stumped... How can I get the count of all of my triples? > > > > Documentation suggests fn:count( cts:triples() ) - but that is > unrealistic when you have any real volume.. > > > > After some thoughts, I came up with this silly approach: > > > > - Added range index on sem:triples > > > > With this, I get OK results(considering hardware) when counting in the > following ways: > > - cts:count-aggregate(cts:element-reference(xs:QName("sem:triple"))) > > - fn:count(doc()//sem:triple) > > > > This seems like a viable approach - because you can still play with the > triples like they are any other document so I am getting the benefit of the > index. But.. for this I added an index just for this purpose, which seems a > bit silly. > > > > OK, maybe in production the question of how many triples I have is > irrelevant, but for testing, it would be a nice thing to know.. > > > > Does anyone else have any idea how to get a count of the number of > triples in a system > > > > Regards, > > David > > David Ennis > > Content Engineer > > > > Mastering the value of content > > creative | technology | content > > Delftechpark 37i > > 2628 XJ Delft > > The Netherlands > > T: +31 88 268 25 00 > > M: +31 6 000 000 00 > > > > > > > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general >
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
