Amir, in case you hadn't seen it, your memory is correct. This was
considered in the past. See
https://phabricator.wikimedia.org/T215858#6631859.

On Tue, Nov 17, 2020 at 2:47 PM Amir Sarabadani <[email protected]> wrote:

> Hello,
> Actually Jaime's email gave me an idea. Why not having a separate actual
> data lake? Like a hadoop cluster, it can even take the data from analytics
> cluster (after being sanitized of course). I remember there were some
> discussions about having a hadoop or Presto cluster in WM Cloud.
>
> Has this been considered?
>
> Thanks.
>
> On Tue, Nov 17, 2020 at 8:05 PM Brooke Storm <[email protected]> wrote:
>
>> ACN: Thanks! We’ve created a ticket for that one to help collaborate and
>> surface the process here: https://phabricator.wikimedia.org/T267992
>> Anybody working on that, please add info there.
>>
>> Brooke Storm
>> Staff SRE
>> Wikimedia Cloud Services
>> [email protected]
>> IRC: bstorm
>>
>> On Nov 17, 2020, at 12:01 PM, AntiCompositeNumber <
>> [email protected]> wrote:
>>
>> I took a look at converting the query used for GreenC Bot's Job 10,
>> which tracks enwiki files that "shadow" a different file on Commons.
>> It is currently run daily, and the query executes in about 60-90
>> seconds. I tried three methods to recreate that query without a SQL
>> cross-database join. The naive method of "just give me all the files"
>> didn't work because it timed out somewhere. The paginated version of
>> that query was on track to take over 5 hours to complete. A similar
>> method that emulates a subquery instead of a join was projected to
>> take about 6 hours. Both stopped early because I got bored of watching
>> them and PAWS doesn't work unattended. I also wasn't able to properly
>> test them because people kept fixing the shadowed files before the
>> script got to them. The code is at
>> <
>> https://public.paws.wmcloud.org/User:AntiCompositeBot/ShadowsCommonsQuery.ipynb
>> >.
>>
>> ACN
>>
>> On Tue, Nov 17, 2020 at 1:02 PM Maarten Dammers <[email protected]>
>> wrote:
>>
>>
>> Hi Joaquin,
>>
>> On 16-11-2020 21:42, Joaquin Oltra Hernandez wrote:
>>
>> Hi Maarten,
>>
>> I believe this work started many years ago, and it was paused, and
>> recently restarted because of the stability and performance problems in the
>> last years.
>>
>> You do realize the current setup was announced as new 3 years ago? See
>> https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/
>> .
>>
>> I'm sorry about the extra work this will cause, I hope the improved
>> stability and performance will make it worth it for you, and that you will
>> reconsider and migrate your code to work on the new architecture (or reach
>> out for specific help if you need it).
>>
>> No, saying sorry won't make it right and no, it won't make it worth it
>> for me. If I want very stable access to a single wiki, I'll use the API of
>> that wiki.
>>
>> --
>> Joaquin Oltra Hernandez
>> Developer Advocate - Wikimedia Foundation
>>
>> It currently doesn't really feel to me that you're advocating for the
>> developers, it feels more like you're the unlucky person having to sell the
>> bad WMF management decisions to the angry developers.
>>
>> Maarten
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> [email protected] (formerly [email protected])
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> [email protected] (formerly [email protected])
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> [email protected] (formerly [email protected])
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>
>
> --
> Amir (he/him)
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> [email protected] (formerly [email protected])
> https://lists.wikimedia.org/mailman/listinfo/cloud
>


-- 
*Nicholas Skaggs*
Engineering Manager, Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________
Wikimedia Cloud Services mailing list
[email protected] (formerly [email protected])
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to