[
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105325#comment-16105325
]
Tim Allison commented on TIKA-2436:
-----------------------------------
With trunk, the file is parsed by the CompressorParser and then the
EMFParser... seems to work. Is there actually text in the example file?
{noformat}
0: X-Parsed-By : org.apache.tika.parser.DefaultParser
0: X-Parsed-By : org.apache.tika.parser.pkg.CompressorParser
0: X-TIKA:parse_time_millis : 337
0: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.pkg.CompressorParser"
/>
<meta name="Content-Type" content="application/gzip" />
<title></title>
</head>
<body><div class="package-entry" />
</body></html>
0: Content-Type : application/gzip
1: X-Parsed-By : org.apache.tika.parser.DefaultParser
1: X-Parsed-By : org.apache.tika.parser.microsoft.EMFParser
1: X-TIKA:parse_time_millis : 83
1: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.EMFParser" />
<meta name="X-TIKA:embedded_resource_path" content="/embedded-1" />
<meta name="Content-Type" content="image/emf" />
<title></title>
</head>
<body /></html>
1: X-TIKA:embedded_resource_path : /embedded-1
1: Content-Type : image/emf
{noformat}
> Support for GZIP-compressed EMF files
> -------------------------------------
>
> Key: TIKA-2436
> URL: https://issues.apache.org/jira/browse/TIKA-2436
> Project: Tika
> Issue Type: Improvement
> Components: mime, parser
> Affects Versions: 1.15
> Reporter: Matthew Caruana Galizia
> Attachments: image004.emz
>
>
> Tika is currently detecting EMZ (compressed EMF) files as simple gzip files.
> These files should instead be detected as EMF files and the EMFParser should
> perform decompression transparently.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)