Ancuta Morarasu created TIKA-2630: ------------------------------------- Summary: Wrong height and width metadata for JPEG images Key: TIKA-2630 URL: https://issues.apache.org/jira/browse/TIKA-2630 Project: Tika Issue Type: Bug Reporter: Ancuta Morarasu
According to [Exif specs|http://www.exif.org/Exif2-2.PDF#page=73&zoom=auto,-176,103], for compressed images the values for width and height should come from the tags: * *PixelXDimension* mapped in metadata-extractor to {{com.drew.metadata.Directory.ExifDirectoryBase.TAG_EXIF_IMAGE_WIDTH}} and * *PixelYDimension* mapped to {{ExifDirectoryBase.TAG_EXIF_IMAGE_HEIGHT}}. {{ImageMetadataExtractor$ExifHandler.[handlePhotoTags(...)|https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java#L487]}} should extract and set these in the metadata: {code:java} if (directory.containsTag(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH)) { metadata.set(Metadata.IMAGE_WIDTH, trimPixels(directory.getDescription(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH))); } if (directory.containsTag(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH)) { metadata.set(Metadata.IMAGE_LENGTH, trimPixels(directory.getDescription(ExifSubIFDDirectory.TAG_EXIF_IMAGE_HEIGHT))); } {code} Also the {{CopyUnknownFieldsHandler}} overrides the values for "Image Width" ({{JpegDirectory.TAG_IMAGE_WIDTH}}) and "Image Height" ({{JpegDirectory.TAG_IMAGE_HEIGHT}}) with the values from {{ExifIFD0Descriptor.TAG_IMAGE_WIDTH}} and {{ExifIFD0Descriptor.TAG_IMAGE_HEIGHT}} because they have the same tag name. I attached a sample image, these are the metadata values: * extracted by metadata-extractor: [JPEG] Image Height = 367 pixels [JPEG] Image Width = 1535 pixels [Exif IFD0] Image Width = 2173 pixels [Exif IFD0] Image Height = 520 pixels [Exif SubIFD] Exif Image Width = 1535 pixels [Exif SubIFD] Exif Image Height = 367 pixels * Tika metadata: Image Height: 520 pixels Image Width: 2173 pixels tiff:ImageLength: 520 tiff:ImageWidth: 2173 Exif Image Height: 367 pixels Exif Image Width: 1535 pixels -- This message was sent by Atlassian JIRA (v7.6.3#76005)