Those counts to which you refer are "incoming page links" and the ones that
Paul mentions are "page views". The latter are aggregated by time period so
there are many numbers for one page even if you aggregate it into one
number per week (using Paul's example). I think Paul meant that it is
difficult to find a good period to sample so that you'd have one number per
page that reflects whatever notion of importance you may have in mind.
 On Oct 11, 2012 1:01 AM, "Stefan Seefeld" <[email protected]> wrote:

> On 10/10/2012 05:31 PM, [email protected] wrote:
> >
> >     Let me share a bit of what I know about the page counts because
> > I’ve been evaluating these as a subjective importance score too.
> >
>
> Great, thanks !
>
> >     It’s about 2 TB of data and I’ve been working with a slow
> > connection so I need to work with samples of this data,  not the whole
> > thing.
> >
> >     I tried sampling a week worth of data and the results were a
> > disaster.  In the first week of August,  Michael Phelps was the most
> > important person in the world.  Maybe that was true.  But it’s not a
> > good answer for a score that’s valid for all time.  It’s clear that
> > the “prior distribution of concepts” that people look up in Wikipedia
> > is highly time dependent and that’s probably also true for other prior
> > distributions.
> >
>
> OK, I think I don't quite understand what "page counts" really measure,
> as I hadn't expected this metric to take up so much extra space, and
> neither that it was so volatile.
> My impression was that this was a count of links from other wikipedia
> pages (or dbpedia entities) to the entity in question, which gives a
> rather non-subjective measure of importance. (Well, "importance" may
> still be a questionable interpretation, but still, not quite as bad as a
> popularity contest.)
> Am I misunderstanding what page counts measure ?
> Why take they up so much memory ? I'd think this could be compressed
> into (roughly) one number per entity. Or are all the link origins
> tracked, too ? (That could be useful, for example if the links are to be
> weighted according to the search criteria. But that definitely make
> things rather complex...)
>
> Thanks,
>         Stefan
>
> --
>
>       ...ich hab' noch einen Koffer in Berlin...
>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to