I'd like to continue exploring this, just not quite sure of the appropriate way forward. I gather doing work like this on the toolforge bastion hosts is frowned upon, so I guess what I should be doing is spinning up a VPS instance on https://horizon.wikimedia.org/? <https://horizon.wikimedia.org/?> I've been reading through https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances <https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances>, from which I gather I need to be a project admin. Is there some existing project I can join to do this exploratory work, or should I create a new project?
> On Mar 31, 2021, at 10:35 PM, Roy Smith <[email protected]> wrote: > > Thanks for looking into this. I tried this again a little later, and it ran > fine. Odd that the amount of memory used depends on the number of rows. I > would expect it would stream results to stdout as they came in, but > apparently not. > > Even weirder that the 100M example runs OOM in 10s, while the 10M example > runs to completion in 36s. Could it be pre-allocating buffer space for the > number of rows it expects to ultimately get? Ugh, that would be a crazy > design, but it does seem like that's what's happening. > >> On Mar 31, 2021, at 9:47 PM, Brooke Storm <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >>> On Mar 31, 2021, at 5:18 PM, Roy Smith <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I'm just playing around on tools-sgebastion-08. I can dump the first 1 >>> million image names about half a minute: >>> >>>> tools.spi-tools-dev:xw-join$ time mysql >>>> --defaults-file=$HOME/replica.my.cnf -h >>>> commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select >>>> img_name from image limit 10000000 ' > /dev/null >>>> >>>> real 0m36.586s >>>> user 0m9.678s >>>> sys 0m1.324s >>> >>> but if I try 10 million, it fails: >>> >>>> tools.spi-tools-dev:xw-join$ time mysql >>>> --defaults-file=$HOME/replica.my.cnf -h >>>> commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select >>>> img_name from image limit 100000000 ' > /dev/null >>>> Killed >>>> >>>> real 0m9.875s >>>> user 0m1.417s >>>> sys 0m1.561s >>> >>> Is there some maximum query size configured by default? The full image >>> table on commons is about 70M rows, so extrapolating from the first >>> example, something like 1 hour to move all that data. >>> >> >> That could be RAM limits on the bastion. >> Actually, scratch that, I’ve confirmed you were killed by the OOM killer on >> that bastion: >> Mar 31 23:29:17 tools-sgebastion-08 kernel: [2860588.199138] mysql invoked >> oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, >> oom_score_adj=0 >> >> -Brooke_______________________________________________ >> Wikimedia Cloud Services mailing list >> [email protected] <mailto:[email protected]> (formerly >> [email protected] <mailto:[email protected]>) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> <https://lists.wikimedia.org/mailman/listinfo/cloud> > _______________________________________________ > Wikimedia Cloud Services mailing list > [email protected] (formerly [email protected]) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list [email protected] (formerly [email protected]) https://lists.wikimedia.org/mailman/listinfo/cloud
