Thanks Richard, That was exactly what I had hoped to find!
Thanks /Omid On Tue, Mar 4, 2008 at 3:10 AM, Richard Cyganiak <[EMAIL PROTECTED]> wrote: > Omid, > > See this thread from the mailing list archives for a summary of what > we know about image (and audio file) URLs in Wikipedia: > http://sourceforge.net/mailarchive/forum.php?thread_name=82593ac00801310402g555e0206h46f1b46a9666ff82%40mail.gmail.com&forum_name=dbpedia-discussion > > Richard > > > > On 4 Mar 2008, at 05:32, Omid Rouhani wrote: > > > The DBPedia dataset "Images" has mappings from wikipedia ids to the > > actual URL where the image can be downloaded. > > > > Example: > > <http://dbpedia.org/resource/%22Buzz%21%21%22_The_Movie> > > <http://xmlns.com/foaf/0.1/depiction> > > <http://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/z_BTM.jpg/200px-z_BTM.jpg > > > > > . > > > > > > However, many of the images have moved around and no longer exist. > > What is the proper way if I want to download (and locally) cache all > > (or at least a large number of) images from Wikipedia? > > > > I have downloaded the full Wikipedia dump > > "enwiki-20080103-pages-articles.xml.bz2" which do contain all image > > names (for example, [[Image:Anarchy-symbol.svg|....]]"). > > But that piece does not give me the actual URL to the image? > > What is the proper way to download all images? I could always call > > "http://en.wikipedia.org/wiki/Image:Anarchy-symbol.svg" for each image > > and from there get the actual URL (which in this case happens to be > > "http://upload.wikimedia.org/wikipedia/commons/thumb/7/7a/Anarchy-symbol.svg/256px-Anarchy-symbol.svg.png > > ") > > and download that image from the URL. > > > > This would require two calls to Wikipedia for each image I want to > > download (one to get the URL, and another to get the image). Is there > > anyway I can do this without putting such stress to their servers? > > It would be good if at least there was a way to get all URLs > > automatically so I do not have to do two calls but only one call per > > image. > > > > I have by the way downloaded the "enwiki-20080103-image.sql.gz" > > dataset, but it only contains meta data about the images, not the URLs > > so I can fetch them. > > Since DBPedias "Image" dataset contains all URLs I assume there is > > someway to obtain all the URLs in a batch. Or has the DBPedia team > > also fetched the URLs by doing one call per image to > > "http://en.wikipedia.org/wiki/Image:XXXXXX"? > > > > > > Thanks > > /Omid > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Dbpedia-discussion mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
