[
https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209162#comment-15209162
]
Ray Gauss II commented on TIKA-774:
-----------------------------------
bq. we should add a static check for whether exiftool is available and adjust
"handled" mimes at that point.
I think we'll find other areas to improve on as well, I just wanted to get the
ball rolling again on the contribution and review as we had to close the source
on the stand-alone project mentioned above.
bq. I should have a chance to look more closely early next week, but I doubt
there's reason to wait for my feedback.
We'd value your feed back, and it's been over 4 years, we can wait a few more
weeks. :)
bq. Is this a replacement for the one I hacked together?
There's the possibility for the two to coexist, perhaps requiring this parser
to be explicitly called programmatically.
At a high level the biggest differences are:
# As mentioned in TIKA-1639, there's an extensive mapping from ExifTool's
namespace to proper Tika properties (currently done programmatically)
# It includes the ability embed, i.e. writing metadata back into binary files.
(TIKA-776)
> ExifTool Parser
> ---------------
>
> Key: TIKA-774
> URL: https://issues.apache.org/jira/browse/TIKA-774
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Affects Versions: 1.0
> Environment: Requires be installed
> (http://www.sno.phy.queensu.ca/~phil/exiftool/)
> Reporter: Ray Gauss II
> Labels: features, new-parser, newbie, patch
> Fix For: 1.13
>
> Attachments: testJPEG_IPTC_EXT.jpg,
> tika-core-exiftool-parser-patch.txt, tika-parsers-exiftool-parser-patch.txt
>
>
> Adds an external parser that calls ExifTool to extract extended metadata
> fields from images and other content types.
> In the core project:
> An ExifTool interface is added which contains Property objects that define
> the metadata fields available.
> An additional Property constructor for internalTextBag type.
> In the parsers project:
> An ExiftoolMetadataExtractor is added which does the work of calling ExifTool
> on the command line and mapping the response to tika metadata fields. This
> extractor could be called instead of or in addition to the existing
> ImageMetadataExtractor and JempboxExtractor under TiffParser and/or
> JpegParser but those have not been changed at this time.
> An ExiftoolParser is added which calls only the ExiftoolMetadataExtractor.
> An ExiftoolTikaMapper is added which is responsible for mapping the ExifTool
> metadata fields to existing tika and Drew Noakes metadata fields if enabled.
> An ElementRdfBagMetadataHandler is added for extracting multi-valued RDF Bag
> implementations in XML files.
> An ExifToolParserTest is added which tests several expected XMP and IPTC
> metadata values in testJPEG_IPTC_EXT.jpg.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)