Hi,

On 01/05/2017 18:18, Nuria Ruiz wrote:
>> are there issues with using the data from the IA? 
> Since that much predates our team record keeping of data issues the
> answer is that we do not know. Maybe someone in this list can chip in
> and we will add this answer to our dataset known issues which can be
> found here:
> 
> https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw#Events_and_known_problems_since_2014-03-01

I should add that there are a handful of files in October 2011[1] that
are incorrect, as they are not compressed and appear to be HTML pages
(also, they are 92KB files instead of being ~85 MB)

Again, the files from Internet Archive[2] seem to be OK.

Cristian

[1] https://dumps.wikimedia.org/other/pagecounts-raw/2011/2011-10/
Specifically, the following:
* pagecounts-20111008-180001.gz
* pagecounts-20111008-190000.gz
* pagecounts-20111008-200000.gz
* pagecounts-20111008-210000.gz
* pagecounts-20111008-220000.gz
[2]: https://archive.org/details/wikipedia_visitor_stats_201110

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to