Since we are studying workloads including a sample of Wikipedia's
traffic over a certain period of time, what we need is patterns of user
access to web servers in a decentralized hosting environment. The access
patterns need to include real hits on their servers per time for one
language. In other words, one trace record we require should contain at
least four features - timestamp (like MM:DD:SS), web server id, page size,
and operations (e.g., create, read, or update a page).
We already reviewed some available downloaded datasets, such as
https://dumps.wikimedia.org/other/pagecounts-raw/. However, they do not
match our requirement. Does anyone know if it is possible to download a
dataset with four features from Wikimedia website? Or should we use REST
API to acquire it? Thank you!
Analytics mailing list