FYI, This isn't for Cloud Services, but we've got something sorta similar for internal analytics replicas.
https://github.com/wikimedia/analytics-refinery/blob/master/bin/analytics-mysql https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/util.py#L135-L254 On Tue, Dec 8, 2020 at 11:20 PM MusikAnimal <[email protected]> wrote: > Hello again! Thinking about this more, I'm wondering if it makes sense to > have a tool to assist with parsing the dblists at noc.wikimedia.org. I > know the official recommendation is to not to connect to slices, but the > issue is how to work locally. I alone maintain many tools that are capable > of connecting to any database. I have a single bash alias I use to set up > my SSH tunnel. When I start a new tool, I just give it 127.0.0.1 as the > host and the 4711 as the port number. Easy peasy. I can't imagine trying to > instruct a newbie how to contribute to tool Foo (which requires a tunnel to > enwiki on port 1234), and tool Bar (tunnel to frwiki on port 5678), etc. > etc... perhaps it's best to establish a standard system for developers > working locally? For the truly "global" tools like I talked about before, > we have to use slices, and though they may not change much it's a lot of > work to check the dblists manually. > > So, my thoughts are this tool could do two things: > 1) A webservice with a form where you enter in your username, local MySQL > port, and whether you want to use the analytics or web replicas. After > submitting, it prints the necessary command, something like: > ssh -L 4711:s1.web.db.svc.eqiad.wmflabs:3306 -L > 4712:s2.web.db.svc.eqiad.wmflabs:3306 … [email protected] > 2) A public API for tools to use to get the slice given a database name. > > For both, it goes by the dblists at noc.wikimedia.org (with some caching > to improve response time). > > So in the README for *my* tool, I tell the developer to go to the above > to get the command they should use to set up the local SSH tunnel. The > README file could even link to a pre-filled form to ensure the port numbers > align with what that tool expects. This way, the developer doesn't even > need to add port numbers and what not to a .env file or what have you, > since the tool goes by what the above tool outputs (though you could > provide a means to override this, in the event the developer has other > things running on those ports). Hopefully what I'm saying makes sense. > > Is this a stupid idea? I might go ahead and build a tool for the #2 use > case, at least, because right now I will have to reinvent the wheel for at > least three "global" tools that I maintain. We could consider also adding > this logic to libraries, such as ToolforgeBundle > <https://github.com/wikimedia/ToolforgeBundle> which is for PHP/Symfony > apps running on Cloud Services. > > ~ MA > > On Mon, Nov 23, 2020 at 10:53 AM Nicholas Skaggs <[email protected]> > wrote: > >> Amir, in case you hadn't seen it, your memory is correct. This was >> considered in the past. See >> https://phabricator.wikimedia.org/T215858#6631859. >> >> On Tue, Nov 17, 2020 at 2:47 PM Amir Sarabadani <[email protected]> >> wrote: >> >>> Hello, >>> Actually Jaime's email gave me an idea. Why not having a separate actual >>> data lake? Like a hadoop cluster, it can even take the data from analytics >>> cluster (after being sanitized of course). I remember there were some >>> discussions about having a hadoop or Presto cluster in WM Cloud. >>> >>> Has this been considered? >>> >>> Thanks. >>> >>> On Tue, Nov 17, 2020 at 8:05 PM Brooke Storm <[email protected]> >>> wrote: >>> >>>> ACN: Thanks! We’ve created a ticket for that one to help collaborate >>>> and surface the process here: https://phabricator.wikimedia.org/T267992 >>>> Anybody working on that, please add info there. >>>> >>>> Brooke Storm >>>> Staff SRE >>>> Wikimedia Cloud Services >>>> [email protected] >>>> IRC: bstorm >>>> >>>> On Nov 17, 2020, at 12:01 PM, AntiCompositeNumber < >>>> [email protected]> wrote: >>>> >>>> I took a look at converting the query used for GreenC Bot's Job 10, >>>> which tracks enwiki files that "shadow" a different file on Commons. >>>> It is currently run daily, and the query executes in about 60-90 >>>> seconds. I tried three methods to recreate that query without a SQL >>>> cross-database join. The naive method of "just give me all the files" >>>> didn't work because it timed out somewhere. The paginated version of >>>> that query was on track to take over 5 hours to complete. A similar >>>> method that emulates a subquery instead of a join was projected to >>>> take about 6 hours. Both stopped early because I got bored of watching >>>> them and PAWS doesn't work unattended. I also wasn't able to properly >>>> test them because people kept fixing the shadowed files before the >>>> script got to them. The code is at >>>> < >>>> https://public.paws.wmcloud.org/User:AntiCompositeBot/ShadowsCommonsQuery.ipynb >>>> >. >>>> >>>> ACN >>>> >>>> On Tue, Nov 17, 2020 at 1:02 PM Maarten Dammers <[email protected]> >>>> wrote: >>>> >>>> >>>> Hi Joaquin, >>>> >>>> On 16-11-2020 21:42, Joaquin Oltra Hernandez wrote: >>>> >>>> Hi Maarten, >>>> >>>> I believe this work started many years ago, and it was paused, and >>>> recently restarted because of the stability and performance problems in the >>>> last years. >>>> >>>> You do realize the current setup was announced as new 3 years ago? See >>>> https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/ >>>> . >>>> >>>> I'm sorry about the extra work this will cause, I hope the improved >>>> stability and performance will make it worth it for you, and that you will >>>> reconsider and migrate your code to work on the new architecture (or reach >>>> out for specific help if you need it). >>>> >>>> No, saying sorry won't make it right and no, it won't make it worth it >>>> for me. If I want very stable access to a single wiki, I'll use the API of >>>> that wiki. >>>> >>>> -- >>>> Joaquin Oltra Hernandez >>>> Developer Advocate - Wikimedia Foundation >>>> >>>> It currently doesn't really feel to me that you're advocating for the >>>> developers, it feels more like you're the unlucky person having to sell the >>>> bad WMF management decisions to the angry developers. >>>> >>>> Maarten >>>> >>>> _______________________________________________ >>>> Wikimedia Cloud Services mailing list >>>> [email protected] (formerly [email protected]) >>>> https://lists.wikimedia.org/mailman/listinfo/cloud >>>> >>>> >>>> _______________________________________________ >>>> Wikimedia Cloud Services mailing list >>>> [email protected] (formerly [email protected]) >>>> https://lists.wikimedia.org/mailman/listinfo/cloud >>>> >>>> >>>> _______________________________________________ >>>> Wikimedia Cloud Services mailing list >>>> [email protected] (formerly [email protected]) >>>> https://lists.wikimedia.org/mailman/listinfo/cloud >>>> >>> >>> >>> -- >>> Amir (he/him) >>> >>> _______________________________________________ >>> Wikimedia Cloud Services mailing list >>> [email protected] (formerly [email protected]) >>> https://lists.wikimedia.org/mailman/listinfo/cloud >>> >> >> >> -- >> *Nicholas Skaggs* >> Engineering Manager, Cloud Services >> Wikimedia Foundation <https://wikimediafoundation.org/> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> [email protected] (formerly [email protected]) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> > _______________________________________________ > Wikimedia Cloud Services mailing list > [email protected] (formerly [email protected]) > https://lists.wikimedia.org/mailman/listinfo/cloud >
_______________________________________________ Wikimedia Cloud Services mailing list [email protected] (formerly [email protected]) https://lists.wikimedia.org/mailman/listinfo/cloud
