I'm just playing around on tools-sgebastion-08.  I can dump the first 1 million 
image names about half a minute:

> tools.spi-tools-dev:xw-join$ time mysql  --defaults-file=$HOME/replica.my.cnf 
> -h commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select 
> img_name from image limit 10000000 ' > /dev/null
> 
> real    0m36.586s
> user    0m9.678s
> sys     0m1.324s

but if I try 10 million, it fails:

> tools.spi-tools-dev:xw-join$ time mysql  --defaults-file=$HOME/replica.my.cnf 
> -h commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select 
> img_name from image limit 100000000 ' > /dev/null
> Killed
> 
> real    0m9.875s
> user    0m1.417s
> sys     0m1.561s

Is there some maximum query size configured by default?  The full image table 
on commons is about 70M rows, so extrapolating from the first example, 
something like 1 hour to move all that data.


> On Mar 31, 2021, at 6:09 PM, Daniel Schwen <[email protected]> wrote:
> 
> I run FastilyBot on a Raspberry Pi, and needless to say it would be grossly 
> impractical for me to perform a "join" in the bot's code.
> 
> Why not run it on WMF Cloud? In code joins will very likely work there and 
> Cloud is supported. You are effectively asking to also support a second way 
> here.
> 
> And I think it is by no means "needless" to say that it would be "grossly 
> impractical" to do this in the bot code. Joining page titles from commons and 
> en.wp could for example be done by having the DB servers return a sorted 
> stream of page titles (incrementally with limit). Your app code would advance 
> the stream that has the lexicographically smaller title (returning equal 
> titles). This would require little effort on the Pi, however it would 
> transfer two big data streams (2-3 GB maybe - so feasibility would depend on 
> your connection, but again this would be cheap on Cloud!). 
> _______________________________________________
> Wikimedia Cloud Services mailing list
> [email protected] (formerly [email protected])
> https://lists.wikimedia.org/mailman/listinfo/cloud

_______________________________________________
Wikimedia Cloud Services mailing list
[email protected] (formerly [email protected])
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to