Hello, We have added a footer to dumps pages with the CC-0 note. Please see: https://dumps.wikimedia.org/other/analytics/
For other changes that you think are needed please do file a phab ticket. Thanks, Nuria On Tue, Feb 11, 2020 at 2:50 PM Nuria Ruiz <[email protected]> wrote: > Regarding Licensing, there is already a ticket: > https://phabricator.wikimedia.org/T244685 > > If you take a look the bottom of wikistats (https://stats.wikimedia.org/v2) > you will see that dedication is CC0, the data in both systems is the same > but, of course, it can be made more explicit. > > Thanks, > > Nuria > > > > On Tue, Feb 11, 2020 at 12:48 PM Leila Zia <[email protected]> wrote: > >> Hi Joseph and team, >> >> summary: congratulations and some suggestions/requests. >> >> I second and third Nate and Neil. Congratulations on meeting this >> milestone. This effort can empower the research community to spend >> less time on joining datasets and trying to resolve existing, known >> (to some) and complex issues with mediawiki history data and instead >> spend time doing the research. Nice! :) >> >> I'm eager to see what the dataset(s) will be used for by others. On my >> end, I am looking forward to seeing more research on how Wiki(m|p)edia >> projects have evolved over the past almost 2 decades now that this >> data is more readily available for studying. What we learn from the >> Wikimedia projects and their evolution can be helpful in understanding >> the broader web ecosystem and its evolution as well (as the Web is >> only 30 years old now). >> >> I have some requests if I may: >> >> * Pine brings up a good point about licenses. It would be great to >> make that clear in the documentation page(s). There are many examples >> of this (that you know better than I), just in case, I find the >> License section of >> https://iccl.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en >> informative, for example. >> >> * The other request I have is that you make the template for citing >> this data-set clear to the end-user in your documentation pages >> (including readme). You can do this in a few different ways: >> >> ** In the documentation pages, put a suggested citation link. For >> example (for bibtex): >> >> @misc{wmfanalytics2020mediawikihistory, >> title = {MediaWiki History}, >> author = {nameoftheauthors}, >> howpublished = "\url{ >> https://dumps.wikimedia.org/other/mediawiki_history/}", >> note = {Accessed on date x}, >> year={2020} >> } >> >> ** Upload a paper about the work on arxiv.org. This way, your work >> gets a DOI that you can use in your documentation pages for folks to >> use for citation. Note that this step can be relatively light-weight. >> (no peer-review in this case and it's relatively quick.) >> >> ** Submit the paper to a conference. Some conferences have a data-set >> paper track where you publish about the dataset you release. Research >> is happy to support you with guidance if you need it and if you choose >> to go down this path. This takes some more time and in return it will >> give you a "peer-review" stamp and more experience in publishing if >> you like that. >> >> Unless you like publishing your work in a peer-reviewed venue, I >> suggest one of the first two approaches. >> >> * I'm not sure if you intend to make the dataset more discoverable >> through places such as https://datasetsearch.research.google.com/ . >> You may want to consider that. >> >> Thanks, >> Leila >> >> -- >> Leila Zia >> Head of Research >> Wikimedia Foundation >> >> On Mon, Feb 10, 2020 at 9:28 PM Pine W <[email protected]> wrote: >> > >> > I was thinking about the licensing issue some more. Apparently there >> > was a relevant United States court case regarding metadata several >> > years ago in the United States, but it's unclear to me from my brief >> > web search whether this holding would apply to metadata from every >> > nation. Also, I don't know if the underlying statues have changed >> > since the time of that ruling. I think that WMF Legal should be >> > consulted regarding the copyright status of the metadata. Also, I >> > think that the licensing of metadata should be explicitly addressed in >> > the Terms of Use or a similar document which is easily accessible to >> > all contributors to Wikimedia sites. >> > >> > Pine >> > ( https://meta.wikimedia.org/wiki/User:Pine ) >> > >> > On Tue, Feb 11, 2020 at 12:17 AM Pine W <[email protected]> wrote: >> > > >> > > Hi Joseph, >> > > >> > > Thanks for this announcement. >> > > >> > > I am looking for license information regarding the dumps, and I'm not >> > > finding it in the pages that you linked at [1] or [2]. The license >> > > that applies to text on Wikimedia sites is often CC-BY-SA 3.0, and the >> > > WMF Terms of Use at >> https://foundation.wikimedia.org/wiki/Terms_of_Use >> > > do not appear to provide any exception for metadata. In the absence of >> > > a specific license, I think that the CC-BY-SA or other relevant >> > > licenses would apply to the metadata, and that the licensing >> > > information should be prominently included on relevant pages and in >> > > the dumps themselves. >> > > >> > > What do you think? >> > > >> > > Pine >> > > ( https://meta.wikimedia.org/wiki/User:Pine ) >> > > >> > > On Mon, Feb 10, 2020 at 4:28 PM Joseph Allemandou >> > > <[email protected]> wrote: >> > > > >> > > > Hi Analytics People, >> > > > >> > > > The Wikimedia Analytics Team is pleased to announce the release of >> the most complete dataset we have to date to analyze content and >> contributors metadata: Mediawiki History [1] [2]. >> > > > >> > > > Data is in TSV format, released monthly around the 3rd of the month >> usually, and every new release contains the full history of metadata. >> > > > >> > > > The dataset contains an enhanced [3] and historified [4] version of >> user, page and revision metadata and serves as a base to Wiksitats API on >> edits, users and pages [5] [6]. >> > > > >> > > > We hope you will have as much fun playing with the data as we have >> building it, and we're eager to hear from you [7], whether for issues, >> ideas or usage of the data. >> > > > >> > > > Analytically yours, >> > > > >> > > > -- >> > > > Joseph Allemandou (joal) (he / him) >> > > > Sr Data Engineer >> > > > Wikimedia Foundation >> > > > >> > > > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html >> > > > [2] >> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps >> > > > [3] Many pre-computed fields are present in the dataset, from >> edit-counts by user and page to reverts and reverted information, as well >> as time between events. >> > > > [4] As accurate as possible historical usernames and page-titles >> (as well as user-groups and blocks) is available in addition to current >> values, and are provided in a denormalized way to every event of the >> dataset. >> > > > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2 >> > > > [6] https://wikimedia.org/api/rest_v1/ >> > > > [7] >> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics >> > > > _______________________________________________ >> > > > Analytics mailing list >> > > > [email protected] >> > > > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
