Re: [Multimedia] [Commons-l] Hashing Wikimedia Commons

Gergo Tisza Fri, 05 Sep 2014 04:44:13 -0700

On Fri, Sep 5, 2014 at 10:21 AM, Jonas Öberg <[email protected]>
wrote:


> It's possible to use Special:Redirect or thumb.php to get the
> thumbnail/URL, but both are actually PHP scripts that need running. So
> while perhaps not ideal, it seems to make the most sense here to
> generate the thumbnail URLs ourselves and hit the web server directly.
>

That can work if you don't mind getting errors in some % of cases where the
file format would require a more complex URL scheme. Otherwise, you have
three options:

   - just use Special:Redirect. Depending on your request frequency, it
   might be fine. We can ask ops what speed limit would be reasonable; for
   bots using the API, the general recommendation is 12 requests per minute.
   - scrape file description pages. The HTML page is cached in varnish and
   it has links to various standard image sizes, so you won't hit PHP this
   way; of course, HTML scraping is not the most reliable way of retrieving
   data.
   - use the API in batches. You can retrieve the information (including
   thumbnail URL) for 500 files in a single request (5000 if you get a bot
   flag):

https://en.wikipedia.org/w/api.php?format=jsonfm&action=query&titles=File:30C3_Commons_Machinery_1.jpg|File:30C3_Commons_Machinery_2.jpg|File:30C3_Commons_Machinery_3.jpg&prop=imageinfo&iiprop=extmetadata|url&iiextmetadatafilter=ObjectName|Artist|LicenseShortName&iiurlwidth=640

IMO the last option is the cleanest one.

_______________________________________________
Multimedia mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/multimedia

Re: [Multimedia] [Commons-l] Hashing Wikimedia Commons

Reply via email to