> current versions only." I don't think there is a dump
> of all pages.
Silly me. I meant "of all revisions", but anyway that's
not what you need. I think the edits are in this dump:
http://download.wikipedia.org/enwiki/latest/enwiki-latest-stub-meta-history.xml.gz
The largest of all those huge files at 11 GB.

On Tue, Nov 10, 2009 at 23:54, Jona Christopher Sahnwaldt
<[email protected]> wrote:
> Hi Richard,
>
> I'm pretty sure that the first two are not available
> in the Wikipedia dumps. For example, [1]
> lists pages-meta-current.xml.bz2 as "All pages,
> current versions only." I don't think there is a dump
> of all pages. There once may have been one, but it
> probably became too big.
>
> For the view count, see [2]. But hey, I also found
> the following at [3]: "Domas Mituzas put together a
> system to gather access statistics from wikipedia's
> squid cluster and publishes it here" [4].
>
> The inlink count can of course be extracted, either
> from the Wikipedia dump [5] or from DBpedia [6].
>
> I wrote a bit of Java code that does exactly that
> because I needed it for the faceted browser [7],
> but didn't publish the results. I can send you the
> code if you want, though it's not really "productized".
>
> Cheers,
> Christopher
>
>
> [1] http://download.wikipedia.org/enwiki/20091026/
> [2] http://lists.wikimedia.org/pipermail/wikitech-l/2007-September/033499.html
> [3] http://stats.grok.se/about
> [4] http://dammit.lt/wikistats/
> [5] http://download.wikipedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
> [6] http://downloads.dbpedia.org/3.4/en/pagelinks_en.nt.bz2
> [7] http://dbpedia.neofonie.de
>
>
>
> On Tue, Nov 10, 2009 at 23:26, Richard Cyganiak <[email protected]> wrote:
>> Hi,
>>
>> I was wondering if the following data is available anywhere as part of
>> DBpedia, or otherwise if there's any hope of getting it from DBpedia
>> in the future. I think, but I'm not sure, that the raw data should be
>> availabe in the Wikipedia database dumps.
>>
>> 1. View counts for Wikipedia pages.
>>
>> 2. Total number of edits for each Wikipedia page.
>>
>> 3. Inlink counts for Wikipedia pages.
>>
>> The first two are attention data. That's an interesting aspect of
>> Wikipedia that isn't fully exploited yet. There are interesting
>> applications where I could learn stuff about my own dataset by meshing
>> it up with attention data from DBpedia. The third one is, in some way,
>> also a measure of attention, and can be useful for ranking.
>>
>> (I'm thinking about stuff that can be done with the New York Times
>> SKOS dataset, and using attention data from Wikipedia to gain insight
>> into the NYT data might be quite interesting.)
>>
>> So, any hint about how to get the data above would be appreciated.
>>
>> Best,
>> Richard
>>
>> ------------------------------------------------------------------------------
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
>> trial. Simplify your report design, integration and deployment - and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to