The difference is very small, but you're right to point it out, I've opened a task to look into it: https://phabricator.wikimedia.org/T205457
On Wed, Sep 19, 2018 at 5:10 PM Felix J. Scholz <[email protected]> wrote: > Hey, > > I've been looking through the documentation on the pageview api in recent > days, and have a question that I have not been able to come up with a > solution to so far. > > Per my understanding, the data accessible through the "aggregated by > project" pageview api [1], when filtered to just query "user" agents, > should return the same results as can be found in the hourly pageview dumps > data [2 / 3]. > > However, while the data is close, in two of my brief tests (for the data > of October 1, 2015) the values did not match up. > > Data from "aggregate" API: > en.wikipedia & excluding spiders [4]: 238.845.634 > pt.wikipedia & excluding spiders [5]: 11.390.043 > > Data from pageview dumps [3]: > en & en.zero & en.m: 238.840.836 > pt & pt.zero & pt.m: 11.389.979 > > As you can see while the values are close, they do not match. > > What am I missing here? Am I maybe mistaken in the notion that the two > data sources are providing data from the same source and thus should be > compatible? > > Felix > > [1] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews > [2] > https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageviews > [3] https://dumps.wikimedia.org/other/pageviews/ > [4] > https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/en.wikipedia/all-access/user/daily/2015100100/2015100100 > [5] > https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/pt.wikipedia/all-access/user/daily/2015100100/2015100100 > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
