Re: [Analytics] How to get the traces of requests to the Wikipedia site in each web server

Nuria Ruiz Mon, 09 Apr 2018 13:34:52 -0700

Hello,

I do not think our downloads or API provide a dataset like the one you are
interested on. From your question I get the feeling that your assumptions
on how our system works does not match reality, wikipedia might not be the
best fit for your study.


The closest data to what you are asking might be this one:
https://analytics.wikimedia.org/datasets/archive/public-datasets/analytics/caching/README,
I would read this ticket to understand the inners of dataset:
https://phabricator.wikimedia.org/T128132

Thanks,

Nuria



On Mon, Apr 9, 2018 at 10:48 AM, Ta-Yuan Hsu <[email protected]> wrote:

> Dear all,
>
>    Since we are studying workloads including a sample of Wikipedia's
> traffic over a certain period of time, what we need is patterns of user
> access to web servers  in a decentralized hosting environment. The access
> patterns need to include real hits on their servers per time for one
> language. In other words, one trace record we require should contain at
> least four features - timestamp (like MM:DD:SS), web server id, page size,
> and operations (e.g., create, read, or update a page).
>
>    We already reviewed some available downloaded datasets, such as
> https://dumps.wikimedia.org/other/pagecounts-raw/. However, they do not
> match our requirement. Does anyone know if it is possible to download a
> dataset with four features from Wikimedia website? Or should we use REST
> API to acquire it?   Thank you!
> --
> Sincerely,
> TA-YUAN
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] How to get the traces of requests to the Wikipedia site in each web server

Reply via email to