Hi Giovanni,
The pagelinks table is great for temporal snapshots: you know about links
between pages at the time of the query. Parsing the wikitext is needed to
provide an historical view of the links :)
Cheers
Joseph

On Tue, Feb 18, 2020 at 12:22 AM Giovanni Luca Ciampaglia <[email protected]>
wrote:

> Thank you Joseph; great to hear there is interest in building such a
> dataset. You say that the link information would need to be parsed from
> wikitext, which is complicated; would the pagelinks table help as an
> alternative source of data?
>
> *Giovanni Luca Ciampaglia* ∙ glciampaglia.com
> Assistant Professor
> Computer Science and Engineering
> <https://www.usf.edu/engineering/cse/> ∙ University
> of South Florida <https://www.usf.edu/>
>
> *Due to Florida’s broad open records law, email to or from university
> employees is public record, available to the public and the media upon
> request.*
>
>
> On Thu, Feb 13, 2020 at 9:27 AM Joseph Allemandou <
> [email protected]>
> wrote:
>
> > Hi Giovanni,
> > Thank you for your message :)
> > You are correct in that there is no information on page-to-page link as
> of
> > today, as well as no information for instance on historical values of
> > revisions being redirects for instance.
> > We share with you the idea that such information is extremely valuable,
> and
> > we have in mind to be able to extract it at some point.
> > The reason for which it has not yet been done is because those pieces
> > of information are only available through parsing the wikitext of every
> > revision, which is not only resource intensive but also complicated
> > technically (templates, version changes etc).
> > You can be sure we will send another announcement when we'll release that
> > data :)
> > Best,
> >
> > On Tue, Feb 11, 2020 at 10:30 PM Giovanni Luca Ciampaglia <
> > [email protected]>
> > wrote:
> >
> > > Hi Joseph,
> > >
> > > Thanks a lot for creating and sharing such a valuable resource. I went
> > > through the schema and from what I understand there is no information
> > about
> > > page-to-page links, correct? Are there any resources that would provide
> > > such historical data?
> > >
> > > Best,
> > >
> > > *Giovanni Luca Ciampaglia* ∙ glciampaglia.com
> > > Assistant Professor
> > > Computer Science and Engineering
> > > <https://www.usf.edu/engineering/cse/> ∙ University
> > > of South Florida <https://www.usf.edu/>
> > >
> > > *Due to Florida’s broad open records law, email to or from university
> > > employees is public record, available to the public and the media upon
> > > request.*
> > >
> > >
> > > On Mon, Feb 10, 2020 at 11:28 AM Joseph Allemandou <
> > > [email protected]> wrote:
> > >
> > > > Hi Analytics People,
> > > >
> > > > The Wikimedia Analytics Team is pleased to announce the release of
> the
> > > most
> > > > complete dataset we have to date to analyze content and contributors
> > > > metadata: Mediawiki History [1] [2].
> > > >
> > > > Data is in TSV format, released monthly around the 3rd of the month
> > > > usually, and every new release contains the full history of metadata.
> > > >
> > > > The dataset contains an enhanced [3] and historified [4] version of
> > user,
> > > > page and revision metadata and serves as a base to Wiksitats API on
> > > edits,
> > > > users and pages [5] [6].
> > > >
> > > > We hope you will have as much fun playing with the data as we have
> > > building
> > > > it, and we're eager to hear from you [7], whether for issues, ideas
> or
> > > > usage of the data.
> > > >
> > > > Analytically yours,
> > > >
> > > > --
> > > > Joseph Allemandou (joal) (he / him)
> > > > Sr Data Engineer
> > > > Wikimedia Foundation
> > > >
> > > > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> > > > [2]
> > > >
> > > >
> > >
> >
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> > > > [3] Many pre-computed fields are present in the dataset, from
> > edit-counts
> > > > by user and page to reverts and reverted information, as well as time
> > > > between events.
> > > > [4] As accurate as possible historical usernames and page-titles (as
> > well
> > > > as user-groups and blocks) is available in addition to current
> values,
> > > and
> > > > are provided in a denormalized way to every event of the dataset.
> > > > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> > > > [6] https://wikimedia.org/api/rest_v1/
> > > > [7]
> > > >
> > > >
> > >
> >
> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [email protected]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [email protected]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> >
> >
> > --
> > Joseph Allemandou (joal) (he / him)
> > Sr Data Engineer
> > Wikimedia Foundation
> > _______________________________________________
> > Wiki-research-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


-- 
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to