[
https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827191#comment-17827191
]
Tim Allison commented on TIKA-4210:
-----------------------------------
Nick is right. The file is an RTF file. Tika does find two embedded files
identified as x-rtf-raw-bitmap. We don't have a parser for that format, I don't
think.
{code:java}
[
{
"Content-Length": "19619",
"Content-Type": "application/rtf",
"X-TIKA:Parsed-By": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.microsoft.rtf.RTFParser"
],
"X-TIKA:Parsed-By-Full-Set": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.microsoft.rtf.RTFParser",
"org.apache.tika.parser.EmptyParser"
],
"X-TIKA:content": "...",
"X-TIKA:content_handler": "ToTextContentHandler",
"X-TIKA:embedded_depth": "0",
"X-TIKA:parse_time_millis": "143",
"resourceName": "sample.DOC.rtf"
},
{
"Content-Length": "52",
"Content-Type": "image/x-rtf-raw-bitmap",
"Content-Type-Parser-Override": "image/x-rtf-raw-bitmap",
"X-TIKA:Parsed-By": "org.apache.tika.parser.EmptyParser",
"X-TIKA:embedded_depth": "1",
"X-TIKA:embedded_id": "1",
"X-TIKA:embedded_id_path": "/1",
"X-TIKA:embedded_resource_path": "/file_0",
"X-TIKA:parse_time_millis": "1",
"resourceName": "file_0",
"rtf_meta:thumbnail": "false"
},
{
"Content-Length": "154",
"Content-Type": "image/x-rtf-raw-bitmap",
"Content-Type-Parser-Override": "image/x-rtf-raw-bitmap",
"X-TIKA:Parsed-By": "org.apache.tika.parser.EmptyParser",
"X-TIKA:embedded_depth": "1",
"X-TIKA:embedded_id": "2",
"X-TIKA:embedded_id_path": "/2",
"X-TIKA:embedded_resource_path": "/file_1",
"X-TIKA:parse_time_millis": "0",
"resourceName": "file_1",
"rtf_meta:thumbnail": "false"
}
] {code}
> Not able to identify tika extension
> -----------------------------------
>
> Key: TIKA-4210
> URL: https://issues.apache.org/jira/browse/TIKA-4210
> Project: Tika
> Issue Type: Bug
> Reporter: Tika User
> Priority: Major
> Attachments: sample.DOC
>
>
> Hi Team,
> The attached embedded file contain .MPGA attachments which tika is not able
> to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still
> showing it as empty. Please look into this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)