Hi, When I added support for more image metadata in TIKA-472, i realized the current design had some restrictions: * I could not access the typed getters from Metadata Extractor, such as getDate (to format iso date) and getStringArray (for keywords). * The handler function was called one field at a time which prevents logic where one field depends on the value of another (there is for example record versions and fields that specify encoding)
I also think it would be clearer if a Parser is per file format and an Extractor is per library used. I refactored TiffExtractor to MetadataExtractorExtractor. We also use ImageIO in the tiff parser so maybe there should be such an extractor too. I'm also looking for an XMP library in java so we can have an extractor for those fields from all kinds of images including adobe programs. This refactoring allowed me to get dates properly, see somment in https://issues.apache.org/jira/browse/TIKA-451. Current version of the class http://github.com/solsson/tika/blob/b25218ed728b727bea71b0799c358f78d6df8c08/tika-parsers/src/main/java/org/apache/tika/parser/image/MetadataExtractorExtractor.java The tests pass pretty much unchanged. Should I create a patch and a ticket for this? /Staffan
