Bryan, thank you for your thoughtful response. It seems I have done you some injustice. I appreciate your due diligence such as the tool you wrote to list database usage. However especially in light of this tool I have trouble understanding how the decision could have been made to beak all the tools that depend on server join with user tables. I'm sure there is a lot of personal opinion going into this, and obviously my perspective and priorities as a tool developer id very different from your as an op. For me it genuinely looked like everything has been working fine before the change and now my biggest tool is destroyed. Hard to see the upside here. I'm probably guilty of having my head stuck in the sand, as somehow I have been oblivious to your multiple communications channels. But in my defense surely you have a way to associate user accounts with emails? So was there no way to leave a server up where the joins are still possible? It wouldn't have to have the same uptime guarantees... Cheers, Daniel Idaho Falls, ID USA
On Sat, Dec 23, 2017 at 8:07 PM Bryan Davis <[email protected]> wrote: > On Sat, Dec 23, 2017 at 5:28 PM, Daniel Schwen <[email protected]> wrote: > > I do appreciate that the ops team is working to improve reliability and > > performance of the database access. Unfortunately it seems to me that > there > > is a disconnect between ops and tool devs. I wonder if the ops actually > > looked at how many user databases have been created and how frequently > they > > got accessed (all that info should be readily available to them). The > logs > > would also have told the ops which users relied in user DBs on the > project > > DB servers. A direct email ahead of time would have gone a long way. > > As noted previously in this thread, the breaking change was first > announced in the blog post about the new Wiki Replica servers > (< > https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/ > >) > on 2017-09-25. The TL;DR and a link to the blog post were also sent to > labs-announce (now cloud-announce) at that time: > < > https://lists.wikimedia.org/pipermail/labs-announce/2017-September/000256.html > > > > Following that "soft" announcement: > * I built a tool at <https://tools.wmflabs.org/tool-db-usage/> to show > all of the tool owned databases that would be effected by the change. > * I created a page on wikitech describing the timeline and impact and > providing a link to the tool: > <https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown> > * The timeline was announced on the cloud-announce mailing list on > 2017-10-19: < > https://lists.wikimedia.org/pipermail/cloud-announce/2017-October/000005.html > > > * MassMessage was used to notify the maintainers of tools that Nick > Wilson and I could identify via their wikitech talk pages. Example at > < > https://wikitech.wikimedia.org/w/index.php?title=User_talk:Andrew_Bogott&diff=1775669&oldid=1773948 > > > > I tried pretty hard here to make sure that tool maintainers who were > going to be effected had months of notice. Obviously this notice did > not reach everyone and for that I am sorry. Making announcements to > 1500 users is difficult. The cloud-announce mailing list is really the > best way that we as administrators have to reach out to people about > sweeping changes like this. We can't force anyone to subscribe or to > read the messages however. > > > The phabricator post contains the same language I've heard many times > > before: The tools devs shouldn't have used the feature anyways. To that I > > say, well, we still did and it worked great. > > I may be missing it, but I do not see anywhere on > <https://phabricator.wikimedia.org/T156869> that any of the > participants chastised the tool developers for using the feature. If I > did say something that was taken that way, I apologize. > > > Volunteer developers have a > > limited time budged with which they create tools that large amounts of > users > > (editors and readers alike) rely on. That is just the reality of things, > and > > it is not the ideal op fantasy, I know. > > Tool developers use the features they are given to build incredible > things. They do this work as volunteers in time that is borrowed from > the rest of their lives (school, work, family, editing the wikis, > etc). The Cloud Services and DBA teams are *very* aware of this and > very grateful for the good works that come from these precious > investments. I have spent the last two years of my employment at the > Foundation seeking to raise awareness of these good works and to find > more resources to help the people who are doing them. > > > The ops seem to be in an asymmetric > > position of power here. It sure sounds a lot like a take it or leave it > > situation to me. > > Yes, there is an asymmetry. A very small number of us have to make > decisions that effect larger numbers. This is true with the Wiki > Replicas; it is true with Cloud Services more generally; it is true > with on-wiki content creators vs readers. In all of these cases the > few attempt to act in the broader best interest of the many. We try to > have consultations with representatives of the groups that we are > acting on behalf of. We try to use good judgment and past experience > to make better decisions tomorrow than we made yesterday. We hope that > the positive impacts of our works out weigh the negative impacts. > Whether we succeed of fail in these attempts can be a matter of > personal opinion. Not everyone will be pleased by every change; this > is unfortunate but true. > > In this very specific case, I made the final call to cease looking for > a technological advance that would allow us to keep the feature of > user managed databases co-located with replicated data from the > production environment. I did this after much more extensive > consultation with my team and the Foundation's DBAs than is reflected > in T156869. This had been a topic of internal discussion since the > beginning of the project to build a new Wiki Replica cluster. In the > end, I felt that the barriers to freely re-routing database query > traffic were too large, and the benefits of that freedom too great, to > recreate the prior un-replicated table situation on the new cluster. > The blog post mentions many of these benefits. > > We are still hoping to find a partial solution > (<https://phabricator.wikimedia.org/T173511>) for replicating some > non-canonical data to the new cluster. Work on that task has > stagnated, but I hope to restart it soon. I think that Jaime has most > of a solution in mind at this point which just needs the final details > to be worked out before we can begin to implement it. This will not be > a 100% solution for all tools, but it will provide some relief. > > > I know that my responses here will not fix broken tools. I know that > tool maintainers experience some amount of fatigue and frustration > caused by each new change added to the environment that they are using > to build and deliver their solutions. I do hope however that they > restore some measure of WP:AGF for the work of the Cloud Services > team, the DBA team, and others who are trying every day to make > Toolforge and Cloud Services a better place for developing and > operating volunteer created technology. > > Bryan > -- > Bryan Davis Wikimedia Foundation <[email protected]> > [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA > irc: bd808 v:415.839.6885 x6855 > <(415)%20839-6885> > > _______________________________________________ > Wikimedia Cloud Services mailing list > [email protected] (formerly [email protected]) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list [email protected] (formerly [email protected]) https://lists.wikimedia.org/mailman/listinfo/cloud
