Amir, in case you hadn't seen it, your memory is correct. This was considered in the past. See https://phabricator.wikimedia.org/T215858#6631859.
On Tue, Nov 17, 2020 at 2:47 PM Amir Sarabadani <[email protected]> wrote: > Hello, > Actually Jaime's email gave me an idea. Why not having a separate actual > data lake? Like a hadoop cluster, it can even take the data from analytics > cluster (after being sanitized of course). I remember there were some > discussions about having a hadoop or Presto cluster in WM Cloud. > > Has this been considered? > > Thanks. > > On Tue, Nov 17, 2020 at 8:05 PM Brooke Storm <[email protected]> wrote: > >> ACN: Thanks! We’ve created a ticket for that one to help collaborate and >> surface the process here: https://phabricator.wikimedia.org/T267992 >> Anybody working on that, please add info there. >> >> Brooke Storm >> Staff SRE >> Wikimedia Cloud Services >> [email protected] >> IRC: bstorm >> >> On Nov 17, 2020, at 12:01 PM, AntiCompositeNumber < >> [email protected]> wrote: >> >> I took a look at converting the query used for GreenC Bot's Job 10, >> which tracks enwiki files that "shadow" a different file on Commons. >> It is currently run daily, and the query executes in about 60-90 >> seconds. I tried three methods to recreate that query without a SQL >> cross-database join. The naive method of "just give me all the files" >> didn't work because it timed out somewhere. The paginated version of >> that query was on track to take over 5 hours to complete. A similar >> method that emulates a subquery instead of a join was projected to >> take about 6 hours. Both stopped early because I got bored of watching >> them and PAWS doesn't work unattended. I also wasn't able to properly >> test them because people kept fixing the shadowed files before the >> script got to them. The code is at >> < >> https://public.paws.wmcloud.org/User:AntiCompositeBot/ShadowsCommonsQuery.ipynb >> >. >> >> ACN >> >> On Tue, Nov 17, 2020 at 1:02 PM Maarten Dammers <[email protected]> >> wrote: >> >> >> Hi Joaquin, >> >> On 16-11-2020 21:42, Joaquin Oltra Hernandez wrote: >> >> Hi Maarten, >> >> I believe this work started many years ago, and it was paused, and >> recently restarted because of the stability and performance problems in the >> last years. >> >> You do realize the current setup was announced as new 3 years ago? See >> https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/ >> . >> >> I'm sorry about the extra work this will cause, I hope the improved >> stability and performance will make it worth it for you, and that you will >> reconsider and migrate your code to work on the new architecture (or reach >> out for specific help if you need it). >> >> No, saying sorry won't make it right and no, it won't make it worth it >> for me. If I want very stable access to a single wiki, I'll use the API of >> that wiki. >> >> -- >> Joaquin Oltra Hernandez >> Developer Advocate - Wikimedia Foundation >> >> It currently doesn't really feel to me that you're advocating for the >> developers, it feels more like you're the unlucky person having to sell the >> bad WMF management decisions to the angry developers. >> >> Maarten >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> [email protected] (formerly [email protected]) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> [email protected] (formerly [email protected]) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> [email protected] (formerly [email protected]) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> > > > -- > Amir (he/him) > > _______________________________________________ > Wikimedia Cloud Services mailing list > [email protected] (formerly [email protected]) > https://lists.wikimedia.org/mailman/listinfo/cloud > -- *Nicholas Skaggs* Engineering Manager, Cloud Services Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________ Wikimedia Cloud Services mailing list [email protected] (formerly [email protected]) https://lists.wikimedia.org/mailman/listinfo/cloud
