Nathan wrote:
> Bernhard Schandl wrote:
>
>> On Nov 12, 2009, at 14:13 , Bernhard Schandl wrote:
>>
>>
>>> I would be interested in statistics per resources, e.g., the average
>>> and maximum number of triples per subject. Can you provide such numbers?
>>>
>> Sorry, this is maybe a little bit too unspecific; especially the
>> distribution of triple numbers (i.e., how many resources have 1-10
>> triples, how many have 11-100, and so on) would be of interest.
>>
>
> couldn't you SPARQL that yourself?
>
> as in; it's my understanding (at my current newbie level of knowledge)
> that this information should be accessible via some SPARQL queries,
> indeed my understanding of RDF was to expose data so that just such
> information could be extracted - ie surely that's the point of rdf and
> sparql?
>
You definitely can use SPARQL for this sort of thing, but because
the dbpedia dumps are just NT files you can also get good results with
good old fashion shell pipelines w/ awk, grep, sort, uniq and
such... often faster.
bzcat *.nt.bz2 | awk '{print $1}' | sort | uniq -c
I've had great luck processing freebase, dbpedia and other data
sets with one or two- pass processes that keep index structures in
memory, avoiding non-sequential I/O almost completely.
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion