Omid,

See this thread from the mailing list archives for a summary of what  
we know about image (and audio file) URLs in Wikipedia:
http://sourceforge.net/mailarchive/forum.php?thread_name=82593ac00801310402g555e0206h46f1b46a9666ff82%40mail.gmail.com&forum_name=dbpedia-discussion

Richard


On 4 Mar 2008, at 05:32, Omid Rouhani wrote:

> The DBPedia dataset "Images" has mappings from wikipedia ids to the
> actual URL where the image can be downloaded.
>
> Example:
> <http://dbpedia.org/resource/%22Buzz%21%21%22_The_Movie>
> <http://xmlns.com/foaf/0.1/depiction>
> <http://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/z_BTM.jpg/200px-z_BTM.jpg
>  
> >
> .
>
>
> However, many of the images have moved around and no longer exist.
> What is the proper way if I want to download (and locally) cache all
> (or at least a large number of) images from Wikipedia?
>
> I have downloaded the full Wikipedia dump
> "enwiki-20080103-pages-articles.xml.bz2" which do contain all image
> names (for example, [[Image:Anarchy-symbol.svg|....]]").
> But that piece does not give me the actual URL to the image?
> What is the proper way to download all images? I could always call
> "http://en.wikipedia.org/wiki/Image:Anarchy-symbol.svg"; for each image
> and from there get the actual URL (which in this case happens to be
> "http://upload.wikimedia.org/wikipedia/commons/thumb/7/7a/Anarchy-symbol.svg/256px-Anarchy-symbol.svg.png
>  
> ")
> and download that image from the URL.
>
> This would require two calls to Wikipedia for each image I want to
> download (one to get the URL, and another to get the image). Is there
> anyway I can do this without putting such stress to their servers?
> It would be good if at least there was a way to get all URLs
> automatically so I do not have to do two calls but only one call per
> image.
>
> I have by the way downloaded the "enwiki-20080103-image.sql.gz"
> dataset, but it only contains meta data about the images, not the URLs
> so I can fetch them.
> Since DBPedias "Image" dataset contains all URLs I assume there is
> someway to obtain all the URLs in a batch. Or has the DBPedia team
> also fetched the URLs by doing one call per image to
> "http://en.wikipedia.org/wiki/Image:XXXXXX";?
>
>
> Thanks
> /Omid
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to