On Wed, Oct 18, 2017 at 8:01 PM, Platonides <[email protected]> wrote: >> It is not a happy thing for us to force anyone to change their >> software, but as explained in the wiki page [0] we can not find a >> reliable method to ensure that the same user created tables are >> available on all three of the new backend servers > > Why not provide a fourth host and have those three servers act as slaves of > it? > > Writes go to the first one, but reads and joins can go to the replicas.
That scenario makes that single master host a single point of failure where all writes to the co-located tables would have to be made. That would in turn mean that taking this one host down for maintenance would halt all ability to write to the tables. There is actually an open discussion for specially "curated" datasets that will likely work like this (<https://phabricator.wikimedia.org/T173511#3556395>). The details are still a work in progress, but the current rough idea is that to be replicated tables/databases will need to meet certain structural restrictions to enable robust replication and have well defined owners who will be responsible to assisting in responding to issues related to their tables. Another likely restriction will be that the tables must be populated by some sort of batch/bulk loading that can be reproduced as needed if the master server has to be rebuilt from scratch. > I'm afraid that "make the JOINs in user space" may end up on some > cases with the app fetching all the rows of the tool table or the wiki > replica in order to perform a JOIN that used to be straightforward. > And the tools aren't really the place to implement all kinds of > partial-fetch implementations and query estimates (without EXPLAIN, > even!). Unless you know of such an efficient user space JOIN > implementation, perhaps? I do not have a universal user space JOIN algorithm that I'm hiding, but there are case by case efficient replacements for many direct JOIN queries. It's all in the specific details of what data is needed from each table however. For a straight join the solution may be interleaved fetches from two open cursors. For a join used to limit results from another table the solution may be batched queries using a WHERE x IN (...) constraint. Working with physically sharded databases is not a completely new constraint in computer science. I'm confident that reasonable solutions can be found for most use cases. I would personally be happy to help people try to find workable solutions on this mailing list, in Phabricator tasks, or even on irc. I think that there are others in our community of developers who would be willing to help out as well. Maybe we can all collaborate to document some common (and uncommon) solutions on a wikitech help page. Again, I want to acknowledge that the removal of this long standing feature is not the ideal outcome. I do not enjoy making decisions that make things more difficult for our technical community. I don't want to close the door on looking for better solutions, but the hard reality is that the labsdb1001 and labsdb1003 are dying. We need to get all of the traffic off of them as soon as we can. If either of them has a hardware failure we will not be able to restore them. This rushed timeline is not ideal, but at the same time I do not want to lock our new servers into unsustainable configurations to alleviate a short term pain. PS. I don't mean to "well actually" your EXPLAIN statement, but with the MariaDB servers EXPLAIN is available in a slightly modified form (<https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Optimizing_queries>) that works with the intervening view layer and without requiring elevated user privileges. Bryan -- Bryan Davis Wikimedia Foundation <[email protected]> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855 _______________________________________________ Wikimedia Cloud Services mailing list [email protected] (formerly [email protected]) https://lists.wikimedia.org/mailman/listinfo/cloud
