On 9/29/15, Jane Park <[email protected]> wrote: > Hi everyone, > > I lead platform work > <https://github.com/creativecommons/platform-initiative> at Creative > Commons. As part of that work, we are exploring the potential of a standard > field in EXIF that could make attribution and license info more sticky > across the web. We are currently in the research phase -- talking to major > image hosting platforms (and platforms that read and ingest images) about > what kinds of image metadata they read and retain. Zhou and his engineering > team at Wikimedia directed me to this list as I am seeking feedback from > the Wikimedia community. > > Ultimately, we want to make it easier for platforms to display provenance > and license info -- increase the likelihood that when a user lands on an > image, they know who created it and what license to use it under. For > example, images from Wikimedia Commons may get tweeted, but the image > metadata is not retained in tweets. How can we work with platforms to use > the same metadata standard so that info can be retained across them? > > Since we are just in the research phase now, I welcome your thoughts on > Wikimedia Commons' and Wikipedia's own uses of image metadata. > Specifically: > > 1. The most common image metadata standards we know about are EXIF and > XMP. Which does Wikimedia primarily read and retain? Are there others > that > are more widely used? > 2. Which standard does Wikimedia prefer? What would be easiest to > implement? for Wikimedia, but also for the platforms that Wikimedia > interfaces with. Aka, what are the pros and cons of each? > > Lastly, welcome any general thoughts about the feasibility and need for > such a project. > > Best, > Jane > > > Jane Park > @janedaily > Creative Commons | Los Angeles > > Make a donation to support CC in 2015: http://bit.ly/supportcc2015 >
Hi Jane. We support both XMP and Exif (along with some other things like PNG iTXT chunks, the older non-xmp version of iptc, etc). To be specific, we only accept properties we are already aware of. Since XMP is an open standard that anyone can add to, we won't recognize any properties not on our whitelist. In the interface, we sometimes present all file metadata under the name "Exif" regardless of where it comes from, so depending on who you ask people might say we only support exif, which is untrue. Its just how we communicate this data to the user. I was actually the one who added the initial support for XMP in MediaWiki as part of a google summer of code project in 2010, so I'm intimately familar with that part of the code. So on the subject of ingestion for license data: Sometimes there are properties across different standards with the same meaning. Sometimes we try to map them together, often letting one type of metadata overwrite another. We try to follow http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf We generally only use the ingested metadata to display in a table on the image description page for extra infomration. We rely on the user to provide license info, and generally don't take that (or much else) from the file's metadata. The primary information about an image displayed by MediaWiki is directly from the user, not the file's metadata. One exception to that, is uploadWizard. It suggests some values based on image exif for date, author, gps location. It does not prefil license at this time. I don't know much about how it works or even if it uses MediaWiki's file metadata extraction routines, or its own thing. We recognize the following properties related to licensing/authorship: Exif: *Copyright *Artist PNG text chunks: *Copyright *Author *Artist Legacy IPTC: * Copyright (2:116) * Byline (2:80) * Credit (2:110) [Although, that doesn't mean what you normally think credit does] * Contact (2:118) XMP: *XMP (using http://creativecommons.org/ns# namespace): ** license ** morePermissions ** attributionURL ** attributionName *XMP (using http://ns.adobe.com/xap/1.0/rights/ namespace ): ** 'Certificate' ** 'Marked' ** 'Owner' **'UsageTerms' **WebStatement * XMP (http://purl.org/dc/elements/1.1/) **rights **creator **contributor *Exif encoded as XMP (aka http://ns.adobe.com/tiff/1.0/ namespace) **Artist **Copyright * XMP (new iptc http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/ ) **CreatorContactInfo Note: IPTC-as-XMP has some props related to copyright/authorship we don't support such as Licensor Note: We do not extract CC-tags from SVGs Its also possible I missed something here. As for adding metadata to images: *We serve the original image exactly as is, if the user asks for the image in the original size. This would include all original metadata, but we don't add any *If the user asks for a small image, we strip all metadata except colour profile. We add a comment (JPEG: a JPEG COM segment. PNG: some sort of text chunk) saying "File source: <url to file>" I personally think we should be adding copyright metadata to thumbnails (but I do not consider it a high priority). Particularly for larger thumbs, where the overhead would be minimal. XMP has a downside of not being that compact. XML is not very compact to begin with. Additonally the official spec suggests people add a lot of whitespace to allow in-place editing, and do not use compression, even if the format supports it, so that people can just scan the file. Most libraries that write it seem to do that. Which is unfortunate from our prespective where we are trying to minimize thumb size. If we were to start adding metadata to thumbs, I think we would start with exif Artist and possibly an exif copyright field that has info on the license. I would be supportable of a system where if the thumbnail is super small (say < 5 kb), we put nothing, if it is medium size (5-300kb) we put those two exif fields I mention. If its > 300kb, we put author, copyright, gps, creative commons xmp tags in it. But other's might feel that reducing thumb bandwidth is more important then the metadata, so its something that would probably have to be discussed (Not to mention, so far nobody has actually volunteered to code it...) That said, I do not see what creative commons stands to gain from adding more metadata standards. The existing XMP fields seem good for people who want fine grained info about the copyright of their image. It seems like new exif fields would take a long time to propagate to implementations As for what standards are widely used: Ancedotally, there's probably a lot more people using Exif then anything else, but that's due to automatic support for digitial cameras. It seems very few people explicitly mark their images with metadata. I hope that helps -- -Brian _______________________________________________ Multimedia mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/multimedia
