Hello,

We have added a footer to dumps pages with the CC-0 note. Please see:
https://dumps.wikimedia.org/other/analytics/

For other changes that you think are needed please do file a phab ticket.

Thanks,

Nuria

On Tue, Feb 11, 2020 at 2:50 PM Nuria Ruiz <[email protected]> wrote:

> Regarding Licensing, there is already a ticket:
> https://phabricator.wikimedia.org/T244685
>
> If you take a look the bottom of wikistats (https://stats.wikimedia.org/v2)
> you will see that dedication is CC0, the data in both systems is the same
> but, of course, it can be made more explicit.
>
> Thanks,
>
> Nuria
>
>
>
> On Tue, Feb 11, 2020 at 12:48 PM Leila Zia <[email protected]> wrote:
>
>> Hi Joseph and team,
>>
>> summary: congratulations and some suggestions/requests.
>>
>> I second and third Nate and Neil. Congratulations on meeting this
>> milestone. This effort can empower the research community to spend
>> less time on joining datasets and trying to resolve existing, known
>> (to some) and complex issues with mediawiki history data and instead
>> spend time doing the research. Nice! :)
>>
>> I'm eager to see what the dataset(s) will be used for by others. On my
>> end, I am looking forward to seeing more research on how Wiki(m|p)edia
>> projects have evolved over the past almost 2 decades now that this
>> data is more readily available for studying. What we learn from the
>> Wikimedia projects and their evolution can be helpful in understanding
>> the broader web ecosystem and its evolution as well (as the Web is
>> only 30 years old now).
>>
>> I have some requests if I may:
>>
>> * Pine brings up a good point about licenses. It would be great to
>> make that clear in the documentation page(s). There are many examples
>> of this (that you know better than I), just in case, I find the
>> License section of
>> https://iccl.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en
>> informative, for example.
>>
>> * The other request I have is that you make the template for citing
>> this data-set clear to the end-user in your documentation pages
>> (including readme). You can do this in a few different ways:
>>
>> ** In the documentation pages, put a suggested citation link. For
>> example (for bibtex):
>>
>> @misc{wmfanalytics2020mediawikihistory,
>>   title = {MediaWiki History},
>>   author = {nameoftheauthors},
>>   howpublished = "\url{
>> https://dumps.wikimedia.org/other/mediawiki_history/}";,
>>   note = {Accessed on date x},
>>   year={2020}
>> }
>>
>> ** Upload a paper about the work on arxiv.org. This way, your work
>> gets a DOI that you can use in your documentation pages for folks to
>> use for citation. Note that this step can be relatively light-weight.
>> (no peer-review in this case and it's relatively quick.)
>>
>> ** Submit the paper to a conference. Some conferences have a data-set
>> paper track where you publish about the dataset you release. Research
>> is happy to support you with guidance if you need it and if you choose
>> to go down this path. This takes some more time and in return it will
>> give you a "peer-review" stamp and more experience in publishing if
>> you like that.
>>
>> Unless you like publishing your work in a peer-reviewed venue, I
>> suggest one of the first two approaches.
>>
>> * I'm not sure if you intend to make the dataset more discoverable
>> through places such as https://datasetsearch.research.google.com/ .
>> You may want to consider that.
>>
>> Thanks,
>> Leila
>>
>> --
>> Leila Zia
>> Head of Research
>> Wikimedia Foundation
>>
>> On Mon, Feb 10, 2020 at 9:28 PM Pine W <[email protected]> wrote:
>> >
>> > I was thinking about the licensing issue some more. Apparently there
>> > was a relevant United States court case regarding metadata several
>> > years ago in the United States, but it's unclear to me from my brief
>> > web search whether this holding would apply to metadata from every
>> > nation. Also, I don't know if the underlying statues have changed
>> > since the time of that ruling. I think that WMF Legal should be
>> > consulted regarding the copyright status of the metadata. Also, I
>> > think that the licensing of metadata should be explicitly addressed in
>> > the Terms of Use or a similar document which is easily accessible to
>> > all contributors to Wikimedia sites.
>> >
>> > Pine
>> > ( https://meta.wikimedia.org/wiki/User:Pine )
>> >
>> > On Tue, Feb 11, 2020 at 12:17 AM Pine W <[email protected]> wrote:
>> > >
>> > > Hi Joseph,
>> > >
>> > > Thanks for this announcement.
>> > >
>> > > I am looking for license information regarding the dumps, and I'm not
>> > > finding it in the pages that you linked at [1] or [2]. The license
>> > > that applies to text on Wikimedia sites is often CC-BY-SA 3.0, and the
>> > > WMF Terms of Use at
>> https://foundation.wikimedia.org/wiki/Terms_of_Use
>> > > do not appear to provide any exception for metadata. In the absence of
>> > > a specific license, I think that the CC-BY-SA or other relevant
>> > > licenses would apply to the metadata, and that the licensing
>> > > information should be prominently included on relevant pages and in
>> > > the dumps themselves.
>> > >
>> > > What do you think?
>> > >
>> > > Pine
>> > > ( https://meta.wikimedia.org/wiki/User:Pine )
>> > >
>> > > On Mon, Feb 10, 2020 at 4:28 PM Joseph Allemandou
>> > > <[email protected]> wrote:
>> > > >
>> > > > Hi Analytics People,
>> > > >
>> > > > The Wikimedia Analytics Team is pleased to announce the release of
>> the most complete dataset we have to date to analyze content and
>> contributors metadata: Mediawiki History [1] [2].
>> > > >
>> > > > Data is in TSV format, released monthly around the 3rd of the month
>> usually, and every new release contains the full history of metadata.
>> > > >
>> > > > The dataset contains an enhanced [3] and historified [4] version of
>> user, page and revision metadata and serves as a base to Wiksitats API on
>> edits, users and pages [5] [6].
>> > > >
>> > > > We hope you will have as much fun playing with the data as we have
>> building it, and we're eager to hear from you [7], whether for issues,
>> ideas or usage of the data.
>> > > >
>> > > > Analytically yours,
>> > > >
>> > > > --
>> > > > Joseph Allemandou (joal) (he / him)
>> > > > Sr Data Engineer
>> > > > Wikimedia Foundation
>> > > >
>> > > > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
>> > > > [2]
>> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
>> > > > [3] Many pre-computed fields are present in the dataset, from
>> edit-counts by user and page to reverts and reverted information, as well
>> as time between events.
>> > > > [4] As accurate as possible historical usernames and page-titles
>> (as well as user-groups and blocks) is available in addition to current
>> values, and are provided in a denormalized way to every event of the
>> dataset.
>> > > > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
>> > > > [6] https://wikimedia.org/api/rest_v1/
>> > > > [7]
>> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics
>> > > > _______________________________________________
>> > > > Analytics mailing list
>> > > > [email protected]
>> > > > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to