On Mon, 2007-06-18 at 13:20 -0700, Jason Kivlighn wrote: > Whoops, I forgot the intro on this. > > This is my progress thus far with extracting licenses from various > formats. Jamie, I'm curious on your thoughts on adding new extractors > (besides the ones mentioned below, GIF is another I have in mind. I'm > not sure whether or not it's worthwhile, however). I don't want to be > adding bloat... > > Cheers, > Jason
Jamie, plese drop us a line to discuss this project. Did you get the chat time invite? jon > Jason Kivlighn wrote: > > Hi, > > > > imagemagick: Uses 'convert filename xmp:-' to output an image's embedded > > XMP. This works for at least JPEG and TIFF files. For JPEGs, however, > > Imagemagick outputs the namespace and XMP, seperated by \0. I'm not > > sure how I can handle this, without simply assuming that 'convert' > > returned two null-terminated strings. Nevertheless, this extracts the > > XMP from TIFF files. > > > > msoffice: Extends the msoffice extractor to also parse the > > DocumentSummeryInformation infile, which contains user-defined metadata, > > along with license metadata embedded by the MSOffice Creative Commons Add-in > > > > pdf: Extends the pdf extractor to read a PDF's metadata stream and parse > > it as XMP. I'm still awaiting poppler extending the glib bindings to > > allow reading the metadata stream. Until then, it will simply never > > find the metadata stream and go on without error. > > > > png: Adds a check for the XML:com:adobe:xmp iTXt field, and parses it as > > XMP. > > > > html: Adds a new html parser using libxml2. Parses the document, > > checking for RDFa licenses. It also checks for other basic HTML > > properties like title and author. > > > > There's also several XML formats I'd like to parse for license data, > > particularly SVG and SMIL. Would this be do-able, and if so, how should > > I go about it? Write new extractors for each format or is this too much > > overhead? These could use GMarkupParse, rather than bringing in libxml2 > > like the HTML parser. > > > > Cheers, > > Jason > > > > > > _______________________________________________ > tracker-list mailing list > [EMAIL PROTECTED] > http://mail.gnome.org/mailman/listinfo/tracker-list > -- Jon Phillips San Francisco, CA USA PH 510.499.0894 [EMAIL PROTECTED] http://www.rejon.org MSN, AIM, Yahoo Chat: kidproto Jabber Chat: [EMAIL PROTECTED] IRC: [EMAIL PROTECTED] _______________________________________________ cc-devel mailing list [email protected] http://lists.ibiblio.org/mailman/listinfo/cc-devel
