Dear Dan, I just read your thread. It's much clearer now to me. As for a possible cleanup I will chime in. Important is to keep old data available in some way, even if they are imperfect. There's no such thing as perfect data.
best regards Maurice On Thu, Dec 24, 2015 at 2:45 PM, Dan Andreescu <[email protected]> wrote: > Maurice, the data per-article is available starting in October: > http://dumps.wikimedia.org/other/pageviews/2015/2015-10/ > > We have the data so we could backfill back to May 2015, but not beyond > that. The backfill process would take quite a long time, however, because > there's a lot of data to crunch through. So we haven't decided to kick off > that back-filling until people ask us for it. So do ask for it if that's > useful. > > If you need data going further back, we have the pagecounts-all-sites > dataset, which at least includes mobile data as well. This is available > per-article starting in late 2014. > > As you can see, this is confusing which is why I just started a thread on > this list about simplifying it. Please chime in there if you have an > opinion about the cleanup. > > On Thu, Dec 24, 2015 at 8:41 AM, Maurice Vergeer <[email protected]> > wrote: > >> Dear Dan, >> >> thanks for this information. I looked at it the new data. Are these data >> also available per subject or only aggregated for the entire site? I am >> mainly interested in statistics about specific pages in Wikipedia. >> For now the traditional measurements are sufficient. I.e. I assume spider >> traffic is sort of random on a daily basis, and because I will do >> timeseries analysis, this will have little affect on the findings >> . >> On a similar topic, when a user is logged in to perform edits to a page >> and he refreshes the wikipedia-page, does this register as a visit? Or do >> only not loggedon visits register as a visit? >> >> To conclude, I think Wikipedia data are very nteresting for scientific >> study. I've seen some studies in information science, but in my field - >> Communicatoin science - very little. I hope to change that with my >> contribution :-) >> >> Again thanks Dan, >> Maurice >> >> >> >> >> On Thu, Dec 24, 2015 at 2:07 PM, Dan Andreescu <[email protected]> >> wrote: >> >>> Maurice, if you're looking for recent data, we have a better source: >>> dumps.wikimedia.org/other/pageviews/ >>> >>> This is better than pagecounts-raw because it excludes spider (crawler) >>> traffic, some other automata traffic, and includes mobile traffic. We have >>> been slow to announce this and change the pages because it's a confusing >>> change. >>> >>> On Thu, Dec 24, 2015 at 4:44 AM, Maurice Vergeer <[email protected]> >>> wrote: >>> >>>> Dear Federico Leva, >>>> thank you very much for the clarification and the quick reply. >>>> >>>> Because I want to relate the pagecounts to events taking place in the >>>> Netherlands (e.g televised political debates), knowing what the timezone is >>>> is important. >>>> >>>> Again thanks and best regards >>>> Maurice Vergeer >>>> >>>> On Thu, Dec 24, 2015 at 10:38 AM, Federico Leva (Nemo) < >>>> [email protected]> wrote: >>>> >>>>> Maurice Vergeer, 24/12/2015 10:16: >>>>> >>>>>> I am looking at your pagecounts as archived on >>>>>> https://dumps.wikimedia.org/other/pagecounts-raw/2015/2015-12/ >>>>>> Can you tell me from what timezone the time stamps originate? >>>>>> >>>>> >>>>> Any and all timestamps in dumps.wikimedia.org are in UTC. Apparently >>>>> this is not as obvious as generally thought, so I've added a note: >>>>> https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FData%2FPagecounts-raw&type=revision&diff=240313&oldid=184255 >>>>> >>>>> Nemo >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>> >>>> >>>> >>>> -- >>>> ________________________________________________ >>>> Maurice Vergeer >>>> To contact me, see http://mauricevergeer.nl/node/5 >>>> To see my publications, see http://mauricevergeer.nl/node/1 >>>> ________________________________________________ >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> >> -- >> ________________________________________________ >> Maurice Vergeer >> To contact me, see http://mauricevergeer.nl/node/5 >> To see my publications, see http://mauricevergeer.nl/node/1 >> ________________________________________________ >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- ________________________________________________ Maurice Vergeer To contact me, see http://mauricevergeer.nl/node/5 To see my publications, see http://mauricevergeer.nl/node/1 ________________________________________________
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
