Hi Maarten
>> 3) read/write access to a shared staging DB that can be used as scratch
>> space for temporary tables (similar to the staging DB on s1-analytics). If
>> you create tables on staging, please prefix them with your shell user id
>> (e.g. dartar_foo).
> You might want to start using the toolserver/toollabs convention that if you
> add _p database, it can be viewed by anyone. That way you can mark databases
> that don't contain private information and might be opened up to more people
> in the future.
in fact, on s1-analytics we have two separate databases:
• “staging” is a sandbox for researchers to store all kind of temporary
datasets, many of which are not meant to be permanently retained or documented
• “prod” is meant to host well-documented datasets that do not contain private
information and are kosher for publication
We have several projects in the pipeline to generate datasets of analytics
interest and that we would like to expose to labs, these include:
• a master dataset of total monthly contributions by user by namespace by
project https://trello.com/c/3ecjp9aM/237-master-monthly-editor-activity-data
• a curated dataset of historical user registration times
https://trello.com/c/NB1WO9fM/315-historical-user-registration-data
• a dataset with revert metadata
https://trello.com/c/FZd4UIcR/29-revert-tracking-and-revert-dump-generation
We also have specs for new server-side logs that will track in a clean way page
creations, page moves and page deletions:
https://trello.com/c/aKzWq1e3/259-create-schemas-for-page-creation-moves-and-deletions
Finally, we’re discussing how to expose to labs existing EventLogging schemas
that include public data that should be made publicly available. I don’t have a
definite ETA for each of these projects, but I’ll make sure we post
announcements on the lists as soon as new data becomes available.
Dario
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics