On 10/10/2012 05:31 PM, [email protected] wrote:
>
> Let me share a bit of what I know about the page counts because
> I’ve been evaluating these as a subjective importance score too.
>
Great, thanks !
> It’s about 2 TB of data and I’ve been working with a slow
> connection so I need to work with samples of this data, not the whole
> thing.
>
> I tried sampling a week worth of data and the results were a
> disaster. In the first week of August, Michael Phelps was the most
> important person in the world. Maybe that was true. But it’s not a
> good answer for a score that’s valid for all time. It’s clear that
> the “prior distribution of concepts” that people look up in Wikipedia
> is highly time dependent and that’s probably also true for other prior
> distributions.
>
OK, I think I don't quite understand what "page counts" really measure,
as I hadn't expected this metric to take up so much extra space, and
neither that it was so volatile.
My impression was that this was a count of links from other wikipedia
pages (or dbpedia entities) to the entity in question, which gives a
rather non-subjective measure of importance. (Well, "importance" may
still be a questionable interpretation, but still, not quite as bad as a
popularity contest.)
Am I misunderstanding what page counts measure ?
Why take they up so much memory ? I'd think this could be compressed
into (roughly) one number per entity. Or are all the link origins
tracked, too ? (That could be useful, for example if the links are to be
weighted according to the search criteria. But that definitely make
things rather complex...)
Thanks,
Stefan
--
...ich hab' noch einen Koffer in Berlin...
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion