On 13/01/11 14:40, Benson Margulies wrote:
On Thu, Jan 13, 2011 at 9:24 AM, Chris Dollin
<[email protected]> wrote:
On Thursday, January 13, 2011 02:15:06 pm Benson Margulies wrote:
I have a graph with 9054 reified statements in it.
What kind of graph?
It's a TDB graph.
How many statements in total?
133735
I'm using reification so that I can answer the question: 'how many
times has fact X been seen.' I could instead construct triples like
_:b my:counts "suri-puri-ouri"
_:b my:count 22
That is, construct an id from the s/p/o, and 'reify' in one statement
instead of using the standard quadlet. Then the query to find the
count would not involve correlating four tuples of a quadlet. I guess
I'd get drummed out of the regiment for violating the RDF way here,
but it might be worth it.
A quick bit of what we used to call 'control c profiling' shows that
the code is spending its time in a TDB btree.
One might imagine TDB building some sort of index behind my back to
optimize the 'listReifications' case.
Please provide a complete example (data and code for what you're doing).
It's a bit hard to comment without some data. (if the data is not
private, send it to me off list). TDB does have some reification
support which is there to be functionally correct for the Jena
reification support. The fact it's in the B+Trees a lot is to be expected.
A query:
SELECT (COUNT(*) AS ?C) { ?x rdf:subject ?z }
or more correct for broken and partial reification:
SELECT (COUNT(DISTINCT ?x) AS ?C) { ?x rdf:subject ?z }
will also get you the answer.
Andy