Hi Richard,

I'm pretty sure that the first two are not available
in the Wikipedia dumps. For example, [1]
lists pages-meta-current.xml.bz2 as "All pages,
current versions only." I don't think there is a dump
of all pages. There once may have been one, but it
probably became too big.

For the view count, see [2]. But hey, I also found
the following at [3]: "Domas Mituzas put together a
system to gather access statistics from wikipedia's
squid cluster and publishes it here" [4].

The inlink count can of course be extracted, either
from the Wikipedia dump [5] or from DBpedia [6].

I wrote a bit of Java code that does exactly that
because I needed it for the faceted browser [7],
but didn't publish the results. I can send you the
code if you want, though it's not really "productized".

Cheers,
Christopher


[1] http://download.wikipedia.org/enwiki/20091026/
[2] http://lists.wikimedia.org/pipermail/wikitech-l/2007-September/033499.html
[3] http://stats.grok.se/about
[4] http://dammit.lt/wikistats/
[5] http://download.wikipedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
[6] http://downloads.dbpedia.org/3.4/en/pagelinks_en.nt.bz2
[7] http://dbpedia.neofonie.de



On Tue, Nov 10, 2009 at 23:26, Richard Cyganiak <[email protected]> wrote:
> Hi,
>
> I was wondering if the following data is available anywhere as part of
> DBpedia, or otherwise if there's any hope of getting it from DBpedia
> in the future. I think, but I'm not sure, that the raw data should be
> availabe in the Wikipedia database dumps.
>
> 1. View counts for Wikipedia pages.
>
> 2. Total number of edits for each Wikipedia page.
>
> 3. Inlink counts for Wikipedia pages.
>
> The first two are attention data. That's an interesting aspect of
> Wikipedia that isn't fully exploited yet. There are interesting
> applications where I could learn stuff about my own dataset by meshing
> it up with attention data from DBpedia. The third one is, in some way,
> also a measure of attention, and can be useful for ranking.
>
> (I'm thinking about stuff that can be done with the New York Times
> SKOS dataset, and using attention data from Wikipedia to gain insight
> into the NYT data might be quite interesting.)
>
> So, any hint about how to get the data above would be appreciated.
>
> Best,
> Richard
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to