FYI, This isn't for Cloud Services, but we've got something sorta
similar for internal analytics replicas.

https://github.com/wikimedia/analytics-refinery/blob/master/bin/analytics-mysql
https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/util.py#L135-L254


On Tue, Dec 8, 2020 at 11:20 PM MusikAnimal <[email protected]> wrote:

> Hello again! Thinking about this more, I'm wondering if it makes sense to
> have a tool to assist with parsing the dblists at noc.wikimedia.org. I
> know the official recommendation is to not to connect to slices, but the
> issue is how to work locally. I alone maintain many tools that are capable
> of connecting to any database. I have a single bash alias I use to set up
> my SSH tunnel. When I start a new tool, I just give it 127.0.0.1 as the
> host and the 4711 as the port number. Easy peasy. I can't imagine trying to
> instruct a newbie how to contribute to tool Foo (which requires a tunnel to
> enwiki on port 1234), and tool Bar (tunnel to frwiki on port 5678), etc.
> etc... perhaps it's best to establish a standard system for developers
> working locally? For the truly "global" tools like I talked about before,
> we have to use slices, and though they may not change much it's a lot of
> work to check the dblists manually.
>
> So, my thoughts are this tool could do two things:
> 1) A webservice with a form where you enter in your username, local MySQL
> port, and whether you want to use the analytics or web replicas. After
> submitting, it prints the necessary command, something like:
>         ssh -L 4711:s1.web.db.svc.eqiad.wmflabs:3306 -L
> 4712:s2.web.db.svc.eqiad.wmflabs:3306 … [email protected]
> 2) A public API for tools to use to get the slice given a database name.
>
> For both, it goes by the dblists at noc.wikimedia.org (with some caching
> to improve response time).
>
> So in the README for *my* tool, I tell the developer to go to the above
> to get the command they should use to set up the local SSH tunnel. The
> README file could even link to a pre-filled form to ensure the port numbers
> align with what that tool expects. This way, the developer doesn't even
> need to add port numbers and what not to a .env file or what have you,
> since the tool goes by what the above tool outputs (though you could
> provide a means to override this, in the event the developer has other
> things running on those ports). Hopefully what I'm saying makes sense.
>
> Is this a stupid idea? I might go ahead and build a tool for the #2 use
> case, at least, because right now I will have to reinvent the wheel for at
> least three "global" tools that I maintain. We could consider also adding
> this logic to libraries, such as ToolforgeBundle
> <https://github.com/wikimedia/ToolforgeBundle> which is for PHP/Symfony
> apps running on Cloud Services.
>
> ~ MA
>
> On Mon, Nov 23, 2020 at 10:53 AM Nicholas Skaggs <[email protected]>
> wrote:
>
>> Amir, in case you hadn't seen it, your memory is correct. This was
>> considered in the past. See
>> https://phabricator.wikimedia.org/T215858#6631859.
>>
>> On Tue, Nov 17, 2020 at 2:47 PM Amir Sarabadani <[email protected]>
>> wrote:
>>
>>> Hello,
>>> Actually Jaime's email gave me an idea. Why not having a separate actual
>>> data lake? Like a hadoop cluster, it can even take the data from analytics
>>> cluster (after being sanitized of course). I remember there were some
>>> discussions about having a hadoop or Presto cluster in WM Cloud.
>>>
>>> Has this been considered?
>>>
>>> Thanks.
>>>
>>> On Tue, Nov 17, 2020 at 8:05 PM Brooke Storm <[email protected]>
>>> wrote:
>>>
>>>> ACN: Thanks! We’ve created a ticket for that one to help collaborate
>>>> and surface the process here: https://phabricator.wikimedia.org/T267992
>>>> Anybody working on that, please add info there.
>>>>
>>>> Brooke Storm
>>>> Staff SRE
>>>> Wikimedia Cloud Services
>>>> [email protected]
>>>> IRC: bstorm
>>>>
>>>> On Nov 17, 2020, at 12:01 PM, AntiCompositeNumber <
>>>> [email protected]> wrote:
>>>>
>>>> I took a look at converting the query used for GreenC Bot's Job 10,
>>>> which tracks enwiki files that "shadow" a different file on Commons.
>>>> It is currently run daily, and the query executes in about 60-90
>>>> seconds. I tried three methods to recreate that query without a SQL
>>>> cross-database join. The naive method of "just give me all the files"
>>>> didn't work because it timed out somewhere. The paginated version of
>>>> that query was on track to take over 5 hours to complete. A similar
>>>> method that emulates a subquery instead of a join was projected to
>>>> take about 6 hours. Both stopped early because I got bored of watching
>>>> them and PAWS doesn't work unattended. I also wasn't able to properly
>>>> test them because people kept fixing the shadowed files before the
>>>> script got to them. The code is at
>>>> <
>>>> https://public.paws.wmcloud.org/User:AntiCompositeBot/ShadowsCommonsQuery.ipynb
>>>> >.
>>>>
>>>> ACN
>>>>
>>>> On Tue, Nov 17, 2020 at 1:02 PM Maarten Dammers <[email protected]>
>>>> wrote:
>>>>
>>>>
>>>> Hi Joaquin,
>>>>
>>>> On 16-11-2020 21:42, Joaquin Oltra Hernandez wrote:
>>>>
>>>> Hi Maarten,
>>>>
>>>> I believe this work started many years ago, and it was paused, and
>>>> recently restarted because of the stability and performance problems in the
>>>> last years.
>>>>
>>>> You do realize the current setup was announced as new 3 years ago? See
>>>> https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/
>>>> .
>>>>
>>>> I'm sorry about the extra work this will cause, I hope the improved
>>>> stability and performance will make it worth it for you, and that you will
>>>> reconsider and migrate your code to work on the new architecture (or reach
>>>> out for specific help if you need it).
>>>>
>>>> No, saying sorry won't make it right and no, it won't make it worth it
>>>> for me. If I want very stable access to a single wiki, I'll use the API of
>>>> that wiki.
>>>>
>>>> --
>>>> Joaquin Oltra Hernandez
>>>> Developer Advocate - Wikimedia Foundation
>>>>
>>>> It currently doesn't really feel to me that you're advocating for the
>>>> developers, it feels more like you're the unlucky person having to sell the
>>>> bad WMF management decisions to the angry developers.
>>>>
>>>> Maarten
>>>>
>>>> _______________________________________________
>>>> Wikimedia Cloud Services mailing list
>>>> [email protected] (formerly [email protected])
>>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>>
>>>>
>>>> _______________________________________________
>>>> Wikimedia Cloud Services mailing list
>>>> [email protected] (formerly [email protected])
>>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>>
>>>>
>>>> _______________________________________________
>>>> Wikimedia Cloud Services mailing list
>>>> [email protected] (formerly [email protected])
>>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>>
>>>
>>>
>>> --
>>> Amir (he/him)
>>>
>>> _______________________________________________
>>> Wikimedia Cloud Services mailing list
>>> [email protected] (formerly [email protected])
>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>
>>
>>
>> --
>> *Nicholas Skaggs*
>> Engineering Manager, Cloud Services
>> Wikimedia Foundation <https://wikimediafoundation.org/>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> [email protected] (formerly [email protected])
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> [email protected] (formerly [email protected])
> https://lists.wikimedia.org/mailman/listinfo/cloud
>
_______________________________________________
Wikimedia Cloud Services mailing list
[email protected] (formerly [email protected])
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to