I hadn’t followed along with this work and this summary is amazing, thank
you so much!

On Sat, Oct 10, 2020 at 21:14 Amir Sarabadani <ladsgr...@gmail.com> wrote:

> Hello,
> It has been a while since I gave an update on the state of abstracting
> schema and schema changes in mediawiki
> <https://phabricator.wikimedia.org/T191231>. So here's a really long one.
>
> So far around half of the mediawiki core tables have been migrated to
> abstract schema (plus lots of extensions lika Wikibase, Babel, Linter,
> BetaFeatures, etc.). Special thanks to Tgr for reviewing most of the
> patches and Sam Reed and James Forrester for doing the extensions.
>
> With the growing number of schemas being abstracted, this is going to
> affect your development if you work on schema and schema changes in core or
> any of the extensions. So If you do, please read Manual:Schema changes
> <https://www.mediawiki.org/wiki/Manual:Schema_changes> in mediawiki.org
>
> You might think that abstraction is just migrating SQL to JSON but it's
> much more, we are making the database schema of mediawiki much more
> consistent, We are basically addressing several long standing issues like
> T164898 <https://phabricator.wikimedia.org/T164898> and T42626
> <https://phabricator.wikimedia.org/T42626> as well.
>
> *Improvement aspects*
>
> First aspect is drifts between different DBMSes. Sqlite schema is being
> produced by regex replacement (this code
> <https://github.com/wikimedia/mediawiki/blob/c477bcf2c5c482d3189ec3579c5dee444eb06f7d/includes/libs/rdbms/database/DatabaseSqlite.php#L898>)
> which is less than great but at least it comes from one place. For
> Postgres, its schema and MySQL/Sqlite has drifted so drastically, that
> fixing it so far required 76 schema changes fixing issues ranging from
> missing indexes to missing PKs, extra AUTO_INCREMENT where it shouldn't be,
> missing DEFAULT values, drifting data types and much more.  You can follow
> the fixes of Postgres in here <https://phabricator.wikimedia.org/T164898>.
>
> The second aspect is the inconsistency in the schema itself. How do we
> model strings? VARCHAR? VARBINARY()? VARCHAR() BINARY? (all three are
> different things). You'd be surprised how inconsistent our MySQL is. So
> far, we are migrating all VARCHAR() BINARY fields to VARBINARY() (so far
> ten schema changes).
>
> Another inconsistency is timestamps. In MySQL, around half of them are
> BINARY(14) and the other half VARBINARY(14) (but in Postgres all are
> TIMESTAMPTZ), there is even a ticket
> <https://phabricator.wikimedia.org/T42626> about it. It makes sense to
> migrate all of them to BINARY(14) but not all timestamps are 14 characters,
> e.g. expiry fields accept "infinity" as value and it's a valid timestamp in
> Postgres ¯\_(ツ)_/¯ When you turn an expiry field to BINARY(14), "infinity"
> becomes "      infinity" and as the result mediawiki doesn't recognize it
> as infinity ("infinity" != "      infinity"). There are several ways to
> move forward handling expiry fields, you can follow the discussion in this
> gerrit patch <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/631936>.
>
> Another fun aspect: Booleans. MySQL doesn't have boolean, it translates
> them to TINYINT(1) but other DBMSes don't have TINYINT, they have SMALLINT
> and BOOL though (and we mostly use SMALLINT for them), we decided to go
> with SMALLINT for these cases (which is different than what Doctrine DBAL
> does, it uses BOOL, so we introduced our own custom type for booleans).
>
> Last but not least: ENUMs. MySQL and Postgres support that but Sqlite
> doesn't. Doctrine DBAL doesn't support ENUM at all (as it's an
> anti-pattern) while core has eight fields that are ENUM. There's an RFC to
> discourage using it in general. Feel free to comment on it.
> <https://phabricator.wikimedia.org/T119173>
>
> A miscellaneous note: The directories that hold the archive of sql patches
> of schema change are exploding (some of the sql patches are even orphan but
> we can't find them because there are so many of them). So I started a RFC
> to clean that mess up: Drop support for database upgrade older than two
> LTS releases <https://phabricator.wikimedia.org/T259771>
>
> *What's next?*
>
>    -  We continue to migrate more tables, hopefully we will get two third
>    of them by the end of the year (fingers crossed). You can follow the
>    progress in its ticket <https://phabricator.wikimedia.org/T230428>.
>    -  We will support abstract schema changes, really soon, like in a
>    couple of weeks. Basically you start a json file containing snapshots of
>    before and after of a table and then a maintenance script will produce the
>    needed sql patches for you for different schemas. This will increase the
>    developer productivity drastically, since 1- Schema change sql files become
>    more reliable and consistent and less prone to errors like adding the
>    column to the wrong table in some DBMSes
>    
> <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/328377/22/maintenance/mssql/archives/patch-user_groups-ug_expiry.sql#4>
>    2- You don't need to know Postgres or Sqlite peculiarities to make patches
>    against it. The reason you need to proved the whole table for adding like
>    an index is that sqlite doesn't support all types of ALTER TABLES, you have
>    to create temporary tables, move the data around and then rename and drop
>    in some cases, producing beautiful sql patches like this
>    
> <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/630341/1/maintenance/sqlite/archives/patch-querycachetwo-qcc_title-varbinary.sql>
>    -  We work on improving the script that reports drifts between core
>    and our production. I have already made it work with abstract schemas as
>    well, I will continue working on it to report even smaller differences like
>    field size, type, etc. Which is now much easier thanks to the abstract
>    schema. Slowly we will migrate that script to production (as part of SRE
>    scripts) and we will do automated reports and automated drift fixes (on
>    small wikis). You can follow the work on this ticket.
>    <https://phabricator.wikimedia.org/T104459> So far, this script is
>    being run manually but found more than thousand and thousands of drifts
>    across the cluster and all are fixed thanks to our amazing DBAs (look at
>    the ticket)
>
>
> *How can I help?*
> Glad you asked! You can follow the abstract-schema
> <https://gerrit.wikimedia.org/r/q/hashtag:%22abstract-schema%22+(status:open%20OR%20status:merged)>
> hashtag in gerrit and review patches or you can make them yourself (get
> yourself familiar using the documentations
> <https://www.mediawiki.org/wiki/Manual:Schema_changes>). If you maintain
> an extension feel free to migrate its table(s) (and track it in this
> ticket <https://phabricator.wikimedia.org/T259374>). If you use Postgres
> for mediawiki, please help us with testing the improvements for Postgres.
>
> Thanks for reading this long email!
>
> Best
>
> --
> Amir (he/him)
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to