Re: [MarkLogic Dev General] Count number of triples

David Ennis Sat, 22 Feb 2014 08:29:21 -0800

HI Mike.

Thanks for the reply.  Yeah,  was surprised that cts:triples was not as
efficient as I had hoped. Adding the index works, but just feels odd.


I really don't have a true use-case for needing the number - just in
testing/developing I found it odd that I could not  get the answer.

Estimating via a sample is OK - now that I know the true number.  I had
been estimating using the count of sem:triple on a random set of 100 docs
and the end always ended up at about 48 million - off by 14% - but on these
biggish numbers, it still gives me a ballpark figure.

Regards,
David



David EnnisContent Engineer[image: HintTech Mastering the value of
content]<http://www.hinttech.com>Mastering
the value of contentcreative | technology | contentDelftechpark 37i2628 XJ
DelftThe NetherlandsT:+31 88 268 25 00M:+31 6 000 000 00[image:
Website]<http://www.hinttech.com>[image:
Twitter] <https://twitter.com/HintTech>[image:
Facebook]<http://www.facebook.com/HintTech>[image:
LinkedIn] <http://www.linkedin.com/company/HintTech> [image: HintTech
Mastering the value of content] <http://www.dayon.nl>


On 22 February 2014 16:30, Michael Blakeley <[email protected]> wrote:

> It seems like this should be possible in SPARQL, but I think a SPARQL
> doesn't have COUNT yet? When that's implemented it might also make sense to
> add an XQuery accessor, maybe something like cts:remainder. Another
> approach might be to make xdmp:estimate accurate for triples.
>
> The fact that count(doc()//sem:triple) is faster than count(cts:triples())
> may be a bug, or at least a missing optimization. If it's an important
> use-case for you, contact support.
>
> If you don't mind a little imprecision you can sample. This assumes the
> count of triples in the first triple document is representative of the rest
> of the database.
>
>     count((//sem:triple)[1]/root()//sem:triple)
>     * xdmp:estimate(//sem:triple)
>
> Of course you could sample more documents rather than just the first one,
> and adjust accordingly.
>
> -- Mike
>
> On 21 Feb 2014, at 23:04 , David Ennis <[email protected]> wrote:
>
> > Howdy.
> >
> > In trying to learn the details of the Triple Store in MarkLogic, I
> decided to keep kicking it until it dies. To really stress it, I am using a
> 1 CPU setup with 2 gig of memory and have loaded in ~42 million triples.
>  It grumbled a bit in the process, but succeeded and the graph endpoint on
> the rest interface is happy enough for some tesing..
> >
> > But...  I am stumped... How can I get the count of all of my triples?
> >
> > Documentation suggests fn:count( cts:triples() )  - but that is
> unrealistic when you have any real volume..
> >
> > After some thoughts, I came up with this silly approach:
> >
> > - Added range index on sem:triples
> >
> > With this, I get OK results(considering hardware) when counting in the
> following ways:
> > - cts:count-aggregate(cts:element-reference(xs:QName("sem:triple")))
> > - fn:count(doc()//sem:triple)
> >
> > This seems like a viable approach  - because you can still play with the
> triples like they are any other document so I am getting the benefit of the
> index. But.. for this I added an index just for this purpose, which seems a
> bit silly.
> >
> > OK, maybe in production the question of how many triples I have is
> irrelevant, but for testing, it would be a nice thing to know..
> >
> > Does anyone else have any idea how to get a count of the number of
> triples in a system
> >
> > Regards,
> > David
> > David Ennis
> > Content Engineer
> >
> > Mastering the value of content
> > creative | technology | content
> > Delftechpark 37i
> > 2628 XJ Delft
> > The Netherlands
> > T:    +31 88 268 25 00
> > M:    +31 6 000 000 00
> >
> >
> >
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Count number of triples

Reply via email to