I'm just playing around on tools-sgebastion-08. I can dump the first 1 million image names about half a minute:
> tools.spi-tools-dev:xw-join$ time mysql --defaults-file=$HOME/replica.my.cnf > -h commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select > img_name from image limit 10000000 ' > /dev/null > > real 0m36.586s > user 0m9.678s > sys 0m1.324s but if I try 10 million, it fails: > tools.spi-tools-dev:xw-join$ time mysql --defaults-file=$HOME/replica.my.cnf > -h commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select > img_name from image limit 100000000 ' > /dev/null > Killed > > real 0m9.875s > user 0m1.417s > sys 0m1.561s Is there some maximum query size configured by default? The full image table on commons is about 70M rows, so extrapolating from the first example, something like 1 hour to move all that data. > On Mar 31, 2021, at 6:09 PM, Daniel Schwen <[email protected]> wrote: > > I run FastilyBot on a Raspberry Pi, and needless to say it would be grossly > impractical for me to perform a "join" in the bot's code. > > Why not run it on WMF Cloud? In code joins will very likely work there and > Cloud is supported. You are effectively asking to also support a second way > here. > > And I think it is by no means "needless" to say that it would be "grossly > impractical" to do this in the bot code. Joining page titles from commons and > en.wp could for example be done by having the DB servers return a sorted > stream of page titles (incrementally with limit). Your app code would advance > the stream that has the lexicographically smaller title (returning equal > titles). This would require little effort on the Pi, however it would > transfer two big data streams (2-3 GB maybe - so feasibility would depend on > your connection, but again this would be cheap on Cloud!). > _______________________________________________ > Wikimedia Cloud Services mailing list > [email protected] (formerly [email protected]) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list [email protected] (formerly [email protected]) https://lists.wikimedia.org/mailman/listinfo/cloud
