Hi Giovanni, The pagelinks table is great for temporal snapshots: you know about links between pages at the time of the query. Parsing the wikitext is needed to provide an historical view of the links :) Cheers Joseph
On Tue, Feb 18, 2020 at 12:22 AM Giovanni Luca Ciampaglia <[email protected]> wrote: > Thank you Joseph; great to hear there is interest in building such a > dataset. You say that the link information would need to be parsed from > wikitext, which is complicated; would the pagelinks table help as an > alternative source of data? > > *Giovanni Luca Ciampaglia* ∙ glciampaglia.com > Assistant Professor > Computer Science and Engineering > <https://www.usf.edu/engineering/cse/> ∙ University > of South Florida <https://www.usf.edu/> > > *Due to Florida’s broad open records law, email to or from university > employees is public record, available to the public and the media upon > request.* > > > On Thu, Feb 13, 2020 at 9:27 AM Joseph Allemandou < > [email protected]> > wrote: > > > Hi Giovanni, > > Thank you for your message :) > > You are correct in that there is no information on page-to-page link as > of > > today, as well as no information for instance on historical values of > > revisions being redirects for instance. > > We share with you the idea that such information is extremely valuable, > and > > we have in mind to be able to extract it at some point. > > The reason for which it has not yet been done is because those pieces > > of information are only available through parsing the wikitext of every > > revision, which is not only resource intensive but also complicated > > technically (templates, version changes etc). > > You can be sure we will send another announcement when we'll release that > > data :) > > Best, > > > > On Tue, Feb 11, 2020 at 10:30 PM Giovanni Luca Ciampaglia < > > [email protected]> > > wrote: > > > > > Hi Joseph, > > > > > > Thanks a lot for creating and sharing such a valuable resource. I went > > > through the schema and from what I understand there is no information > > about > > > page-to-page links, correct? Are there any resources that would provide > > > such historical data? > > > > > > Best, > > > > > > *Giovanni Luca Ciampaglia* ∙ glciampaglia.com > > > Assistant Professor > > > Computer Science and Engineering > > > <https://www.usf.edu/engineering/cse/> ∙ University > > > of South Florida <https://www.usf.edu/> > > > > > > *Due to Florida’s broad open records law, email to or from university > > > employees is public record, available to the public and the media upon > > > request.* > > > > > > > > > On Mon, Feb 10, 2020 at 11:28 AM Joseph Allemandou < > > > [email protected]> wrote: > > > > > > > Hi Analytics People, > > > > > > > > The Wikimedia Analytics Team is pleased to announce the release of > the > > > most > > > > complete dataset we have to date to analyze content and contributors > > > > metadata: Mediawiki History [1] [2]. > > > > > > > > Data is in TSV format, released monthly around the 3rd of the month > > > > usually, and every new release contains the full history of metadata. > > > > > > > > The dataset contains an enhanced [3] and historified [4] version of > > user, > > > > page and revision metadata and serves as a base to Wiksitats API on > > > edits, > > > > users and pages [5] [6]. > > > > > > > > We hope you will have as much fun playing with the data as we have > > > building > > > > it, and we're eager to hear from you [7], whether for issues, ideas > or > > > > usage of the data. > > > > > > > > Analytically yours, > > > > > > > > -- > > > > Joseph Allemandou (joal) (he / him) > > > > Sr Data Engineer > > > > Wikimedia Foundation > > > > > > > > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html > > > > [2] > > > > > > > > > > > > > > https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps > > > > [3] Many pre-computed fields are present in the dataset, from > > edit-counts > > > > by user and page to reverts and reverted information, as well as time > > > > between events. > > > > [4] As accurate as possible historical usernames and page-titles (as > > well > > > > as user-groups and blocks) is available in addition to current > values, > > > and > > > > are provided in a denormalized way to every event of the dataset. > > > > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2 > > > > [6] https://wikimedia.org/api/rest_v1/ > > > > [7] > > > > > > > > > > > > > > https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics > > > > _______________________________________________ > > > > Wiki-research-l mailing list > > > > [email protected] > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > > _______________________________________________ > > > Wiki-research-l mailing list > > > [email protected] > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > > > > -- > > Joseph Allemandou (joal) (he / him) > > Sr Data Engineer > > Wikimedia Foundation > > _______________________________________________ > > Wiki-research-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > -- Joseph Allemandou (joal) (he / him) Sr Data Engineer Wikimedia Foundation
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
