[ 
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105325#comment-16105325
 ] 

Tim Allison commented on TIKA-2436:
-----------------------------------

With trunk, the file is parsed by the CompressorParser and then the 
EMFParser... seems to work.  Is there actually text in the example file?

{noformat}
0: X-Parsed-By : org.apache.tika.parser.DefaultParser
0: X-Parsed-By : org.apache.tika.parser.pkg.CompressorParser
0: X-TIKA:parse_time_millis : 337
0: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml";>
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.pkg.CompressorParser" 
/>
<meta name="Content-Type" content="application/gzip" />
<title></title>
</head>
<body><div class="package-entry" />
</body></html>
0: Content-Type : application/gzip
1: X-Parsed-By : org.apache.tika.parser.DefaultParser
1: X-Parsed-By : org.apache.tika.parser.microsoft.EMFParser
1: X-TIKA:parse_time_millis : 83
1: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml";>
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.EMFParser" />
<meta name="X-TIKA:embedded_resource_path" content="/embedded-1" />
<meta name="Content-Type" content="image/emf" />
<title></title>
</head>
<body /></html>
1: X-TIKA:embedded_resource_path : /embedded-1
1: Content-Type : image/emf
{noformat}

> Support for GZIP-compressed EMF files
> -------------------------------------
>
>                 Key: TIKA-2436
>                 URL: https://issues.apache.org/jira/browse/TIKA-2436
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime, parser
>    Affects Versions: 1.15
>            Reporter: Matthew Caruana Galizia
>         Attachments: image004.emz
>
>
> Tika is currently detecting EMZ (compressed EMF) files as simple gzip files. 
> These files should instead be detected as EMF files and the EMFParser should 
> perform decompression transparently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to