Hello Analytics,

The Data Engineering team will start the deployment[1] of the changes that
will support the Temp Accounts
<https://www.mediawiki.org/wiki/Trust_and_Safety_Product/Temporary_Accounts>
initiative in the Data Lake
<https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake> starting
today Wednesday January 22nd 2025.
These changes are not activating the Temp Accounts feature in any of the
wikis, but rather enabling support for Temp Accounts in the Hadoop Data
Lake.
It is expected that some MediaWiki related Data Lake tables[2] might be
temporarily unavailable during the following couple of days.
By the end of this process MediaWikiHistory tables and other derivative
tables will fully support Temp Accounts new semantics and data.

As part of the deployment process we plan to re-run the jobs for the
2024-12 snapshot.
This means the data model for that snapshot will be updated.
The changes are mostly backwards compatible, except for:

   - The mediawiki_user_history table's `anonymous` field will be renamed
   to `is_anonymous`.
   - The geoeditors_edits_monthly table's `editors_are_anonymous` field
   will be renamed to `users_are_anonymous`.
   - The MediaWikiHistory dumps will have some new fields inserted, and the
   order of the existing fields will change.

We haven't found any existing code (within the WMF) that could break due to
these non-backwards compatible changes, but if you find any, please let us
know.

[1] Deployment plan
<https://docs.google.com/document/d/1-GhyLepEL7rqJlY1a2RKQ_1YI2QYgVpFSmzpq9nXIag/edit?tab=t.0>

[2] List of affected tables

   - wmf.mediawiki_history
   - wmf.mediawiki_user_history
   - wmf.mediawiki_page_history
   - wmf.mediawiki_history_reduced
   - wmf.edit_hourly
   - wmf.editors_daily
   - wmf.unique_editors_by_country
   - wmf.geoeditors_edits_monthly
   - wmf.geoeditors_monthly
   - wmf.geoeditors_public_monthly


-- 
*Marcel Ruiz Forns** (he/him)*
Senior Software Engineer
_______________________________________________
Analytics mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to