Thanks for doing this Magnus. I am super busy next week, going to two conferences, but I've scheduled some time near the end of October to evaluate this & see if I can get it working in the cluster.
On 10/14/11 12:58 AM, Magnus Manske wrote: > On Thu, Oct 13, 2011 at 9:58 PM, Neil Kandalgaonkar<[email protected]> > wrote: >> Google has a standard for us to tell them the license, in the extended >> Sitemap syntax for images, linked to above. That's what we should do, >> because it would make that information available to Google, and >> potentially to any other search engines that can read that standard. > > I have created a preliminary sitemap file for Commons on the toolserver. > > I use categories to find licenses, currently CC-BY-SA, CC-BY, GFDL, > and PD. This can assign 9,355,602 of our 11.3M files at least one > license. (There might be multiple entries for the same file in there, > though.) It's farm from complete, but a reasonable start IMHO. > > For those with toolserver access, the file is here (300MB gzipped): > /mnt/user-store/magnus/commons.sitemap.gz > > Generation took 38 minutes. Script (hereby under GFDL) is here: > /home/magnus/commons_sitemap/make_sitemap.pl (utilizing > /home/magnus/sql_quick ) > > > Magnus > > _______________________________________________ > Commons-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/commons-l -- Neil Kandalgaonkar (| <[email protected]> _______________________________________________ Commons-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/commons-l
