Hi, On 01/05/2017 18:18, Nuria Ruiz wrote: >> are there issues with using the data from the IA? > Since that much predates our team record keeping of data issues the > answer is that we do not know. Maybe someone in this list can chip in > and we will add this answer to our dataset known issues which can be > found here: > > https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw#Events_and_known_problems_since_2014-03-01
I should add that there are a handful of files in October 2011[1] that are incorrect, as they are not compressed and appear to be HTML pages (also, they are 92KB files instead of being ~85 MB) Again, the files from Internet Archive[2] seem to be OK. Cristian [1] https://dumps.wikimedia.org/other/pagecounts-raw/2011/2011-10/ Specifically, the following: * pagecounts-20111008-180001.gz * pagecounts-20111008-190000.gz * pagecounts-20111008-200000.gz * pagecounts-20111008-210000.gz * pagecounts-20111008-220000.gz [2]: https://archive.org/details/wikipedia_visitor_stats_201110 _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
