[ 
https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827191#comment-17827191
 ] 

Tim Allison commented on TIKA-4210:
-----------------------------------

Nick is right. The file is an RTF file. Tika does find two embedded files 
identified as x-rtf-raw-bitmap. We don't have a parser for that format, I don't 
think.
{code:java}
[
    {
        "Content-Length": "19619",
        "Content-Type": "application/rtf",
        "X-TIKA:Parsed-By": [
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.microsoft.rtf.RTFParser"
        ],
        "X-TIKA:Parsed-By-Full-Set": [
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.microsoft.rtf.RTFParser",
            "org.apache.tika.parser.EmptyParser"
        ],
        "X-TIKA:content": "...",
        "X-TIKA:content_handler": "ToTextContentHandler",
        "X-TIKA:embedded_depth": "0",
        "X-TIKA:parse_time_millis": "143",
        "resourceName": "sample.DOC.rtf"
    },
    {
        "Content-Length": "52",
        "Content-Type": "image/x-rtf-raw-bitmap",
        "Content-Type-Parser-Override": "image/x-rtf-raw-bitmap",
        "X-TIKA:Parsed-By": "org.apache.tika.parser.EmptyParser",
        "X-TIKA:embedded_depth": "1",
        "X-TIKA:embedded_id": "1",
        "X-TIKA:embedded_id_path": "/1",
        "X-TIKA:embedded_resource_path": "/file_0",
        "X-TIKA:parse_time_millis": "1",
        "resourceName": "file_0",
        "rtf_meta:thumbnail": "false"
    },
    {
        "Content-Length": "154",
        "Content-Type": "image/x-rtf-raw-bitmap",
        "Content-Type-Parser-Override": "image/x-rtf-raw-bitmap",
        "X-TIKA:Parsed-By": "org.apache.tika.parser.EmptyParser",
        "X-TIKA:embedded_depth": "1",
        "X-TIKA:embedded_id": "2",
        "X-TIKA:embedded_id_path": "/2",
        "X-TIKA:embedded_resource_path": "/file_1",
        "X-TIKA:parse_time_millis": "0",
        "resourceName": "file_1",
        "rtf_meta:thumbnail": "false"
    }
] {code}

> Not able to identify tika extension
> -----------------------------------
>
>                 Key: TIKA-4210
>                 URL: https://issues.apache.org/jira/browse/TIKA-4210
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tika User
>            Priority: Major
>         Attachments: sample.DOC
>
>
> Hi Team,
> The attached embedded file contain .MPGA attachments which tika is  not able 
> to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still 
> showing it as empty. Please look into this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to