Thanks for doing this Magnus. I am super busy next week, going to two 
conferences, but I've scheduled some time near the end of October to 
evaluate this & see if I can get it working in the cluster.

On 10/14/11 12:58 AM, Magnus Manske wrote:
> On Thu, Oct 13, 2011 at 9:58 PM, Neil Kandalgaonkar<[email protected]>  
> wrote:
>> Google has a standard for us to tell them the license, in the extended
>> Sitemap syntax for images, linked to above. That's what we should do,
>> because it would make that information available to Google, and
>> potentially to any other search engines that can read that standard.
>
> I have created a preliminary sitemap file for Commons on the toolserver.
>
> I use categories to find licenses, currently CC-BY-SA, CC-BY, GFDL,
> and PD. This can assign 9,355,602 of our 11.3M files at least one
> license. (There might be multiple entries for the same file in there,
> though.) It's farm from complete, but a reasonable start IMHO.
>
> For those with toolserver access, the file is here (300MB gzipped):
> /mnt/user-store/magnus/commons.sitemap.gz
>
> Generation took 38 minutes. Script (hereby under GFDL) is here:
> /home/magnus/commons_sitemap/make_sitemap.pl (utilizing 
> /home/magnus/sql_quick )
>
>
> Magnus
>
> _______________________________________________
> Commons-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/commons-l

-- 
Neil Kandalgaonkar (|  <[email protected]>

_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l

Reply via email to