[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267101#comment-14267101 ]
Tim Allison edited comment on TIKA-1445 at 1/7/15 1:13 AM: ----------------------------------------------------------- I'm sorry that I haven't had a chance to kick the tires on the fix for this issue. This may be a case of user error, perhaps I have to twiddle with the parser config file? I found that the current fix (with default configuration) is not pulling metadata from embedded image files in tika-trunk or tika-1.7-rc2. Test doc from govdocs1 attached. We should be extracting these values (at least) in the embedded tiff: {noformat} "Data Precision":"8 bits","Image Height":"169 pixels","Image Width":"752 pixels","Number of Components":"3","Resolution Units":"inch","X Resolution":"300 dots","Y Resolution":"300 dots","resourceName":"image1.jpg","tiff:BitsPerSample":"8","tiff:ImageLength":"169","tiff:ImageWidth":"752","tika.mime.file":"image1.jpg" {noformat} was (Author: talli...@mitre.org): I'm sorry that I haven't had a chance to kick the tires on the fix for this issue. I just discovered that the current fix is not pulling metadata from embedded image files in tika-trunk or tika-1.7-rc2. Test doc from govdocs1 attached. We should be extracting these values (at least) in the embedded tiff: {noformat} "Data Precision":"8 bits","Image Height":"169 pixels","Image Width":"752 pixels","Number of Components":"3","Resolution Units":"inch","X Resolution":"300 dots","Y Resolution":"300 dots","resourceName":"image1.jpg","tiff:BitsPerSample":"8","tiff:ImageLength":"169","tiff:ImageWidth":"752","tika.mime.file":"image1.jpg" {noformat} > Figure out how to add Image metadata extraction to Tesseract parser > ------------------------------------------------------------------- > > Key: TIKA-1445 > URL: https://issues.apache.org/jira/browse/TIKA-1445 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Fix For: 1.8 > > Attachments: 000003.doc, TIKA-1445.Mattmann.101214.patch.txt, > TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, > TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch > > > Now that Tesseract is the default image parser in Tika for many image types, > consider how to add back in the metadata extraction capabilities by the other > Image parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)