> current versions only." I don't think there is a dump > of all pages. Silly me. I meant "of all revisions", but anyway that's not what you need. I think the edits are in this dump: http://download.wikipedia.org/enwiki/latest/enwiki-latest-stub-meta-history.xml.gz The largest of all those huge files at 11 GB.
On Tue, Nov 10, 2009 at 23:54, Jona Christopher Sahnwaldt <[email protected]> wrote: > Hi Richard, > > I'm pretty sure that the first two are not available > in the Wikipedia dumps. For example, [1] > lists pages-meta-current.xml.bz2 as "All pages, > current versions only." I don't think there is a dump > of all pages. There once may have been one, but it > probably became too big. > > For the view count, see [2]. But hey, I also found > the following at [3]: "Domas Mituzas put together a > system to gather access statistics from wikipedia's > squid cluster and publishes it here" [4]. > > The inlink count can of course be extracted, either > from the Wikipedia dump [5] or from DBpedia [6]. > > I wrote a bit of Java code that does exactly that > because I needed it for the faceted browser [7], > but didn't publish the results. I can send you the > code if you want, though it's not really "productized". > > Cheers, > Christopher > > > [1] http://download.wikipedia.org/enwiki/20091026/ > [2] http://lists.wikimedia.org/pipermail/wikitech-l/2007-September/033499.html > [3] http://stats.grok.se/about > [4] http://dammit.lt/wikistats/ > [5] http://download.wikipedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz > [6] http://downloads.dbpedia.org/3.4/en/pagelinks_en.nt.bz2 > [7] http://dbpedia.neofonie.de > > > > On Tue, Nov 10, 2009 at 23:26, Richard Cyganiak <[email protected]> wrote: >> Hi, >> >> I was wondering if the following data is available anywhere as part of >> DBpedia, or otherwise if there's any hope of getting it from DBpedia >> in the future. I think, but I'm not sure, that the raw data should be >> availabe in the Wikipedia database dumps. >> >> 1. View counts for Wikipedia pages. >> >> 2. Total number of edits for each Wikipedia page. >> >> 3. Inlink counts for Wikipedia pages. >> >> The first two are attention data. That's an interesting aspect of >> Wikipedia that isn't fully exploited yet. There are interesting >> applications where I could learn stuff about my own dataset by meshing >> it up with attention data from DBpedia. The third one is, in some way, >> also a measure of attention, and can be useful for ranking. >> >> (I'm thinking about stuff that can be done with the New York Times >> SKOS dataset, and using attention data from Wikipedia to gain insight >> into the NYT data might be quite interesting.) >> >> So, any hint about how to get the data above would be appreciated. >> >> Best, >> Richard >> >> ------------------------------------------------------------------------------ >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >> trial. Simplify your report design, integration and deployment - and focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> Dbpedia-discussion mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
