On Fri, Sep 27, 2013 at 8:27 AM, Mr. Puneet Kishor <[email protected]> wrote: > On Sep 27, 2013, at 8:27 PM, Maarten Zeinstra <[email protected]> wrote: >> Creative Commons Netherlands host our own explanation of the CC-licenses >> onhttp://creativecommons.nl/uitleg/ We have links to all CC-licenses there >> and we license our entire site under a CC BY 3.0 >> >> I now got a mail with a question why people need to attribute Creative >> Commons Netherlands when they want to use CC BY 3.0. It turns out that the >> metadata scrapers sees the website license and adds the extra metadatat to >> the deed page. >> >> How come this does not happen at https://creativecommons.org/licenses/ and >> wow would I be able to avoid this?
It doesn't happen on https://creativecommons.org/licenses/ because that links to non-https deeds, so referer isn't sent, so scraper can't do anything. (There are other problems, see below, which would make it not work anyway.) This also points to an easy immediate solution for you: link to https versions of the deeds from the .nl site, which is only served over http. Referer won't get sent. The only general fix I can think of would be for the scraper to be more conservative than it is -- look for bare (ie, not objects of license statements) license urls, and if there's one that's the same as a license url that is object of a license statement, don't add anything to the deed, because there's no way of telling which one the user clicked on. > This is almost impossible to answer without actually seeing the specific > scraper that is making the mistake of combining the un-ported CC BY metadata > with the CC BY NL metadata. That said, when I identify the RDFa on the page > using W3C's N3 bookmarklet, I get > > <http://creativecommons.nl/uitleg/> > <?> "article" ; > <?> "Uitleg bij de Creative Commons licenties" ; > <?> "http://creativecommons.nl/uitleg/" ; > <?> "Van "Alle rechten voorbehouden" naar "Sommige rechten > voorbehouden" Creative Commons biedt auteurs, kunstenaars, wetenschappers, > docenten en alle andere creatieve makers de vrijheid om op een flexi..." ; > <?> "Creative Commons Nederland" ; > <?> > "http://creativecommons.nl/wp-content/uploads/2009/09/Schermafbeelding-2012-12-10-om-14.07.28.png" > ; > <http://www.w3.org/1999/xhtml#license> > <http://creativecommons.org/licenses/by/3.0/nl/> ; > <http://www.w3.org/1999/xhtml#license> > <http://creativecommons.org/licenses/by/3.0/nl/> ; > <http://creativecommons.org/ns#attributionURL> > <http://www.creativecommons.nl/> ; > <http://creativecommons.org/ns#attributionName> "Creative > Commons Nederland" . > > Doing the same on http://creativecommons.org/licenses/ give > > <http://creativecommons.org/licenses/> > <http://www.w3.org/1999/xhtml#license> > <http://creativecommons.org/licenses/by/3.0/> . > > <http://creativecommons.org/> > <http://creativecommons.org/ns#attributionURL> > <http://creativecommons.org/> ; > <http://creativecommons.org/ns#attributionName> "this site" ; > <http://www.w3.org/1999/xhtml#license> > <http://creativecommons.org/licenses/by/3.0/> . > > As you can see, the first one has the URI to the NL version, while the latter > points to the unported version. This is quite broken. I see from archive.org it has been in place since September of 2012. Maybe someone would've noticed if it caused obviously incorrect behavior, rather than just sitting there being wrong, or maybe nobody cares about metadata. ;) Anyway: * Kind of silly for every page on the site to make statements about the CC home page * The parser you're using seems to be appending / after the hostname, but I'm not sure if that can be counted on -- no trailing / is specified in the page, which means the subject will never match the referer, even when clicking from the home page, as the referer from the home page will always be http[s]://creativecommons.org/ (note trailing slash) * The CC homepage isn't the attribution URL most useful to users -- providing it won't get directly back to the material of interest, unless that just happens to be the CC homepage. (This applies to the CC NL attributionURL above as well.) * "this site" as the attribution name is plain silly * Fixes could include removing the about property, changing attributionURL to "" (ie current page), and rewording so that attributionName can be "Creative Commons", or just skip that annotation Mike _______________________________________________ cc-devel mailing list [email protected] http://lists.ibiblio.org/mailman/listinfo/cc-devel
