[ 
https://issues.apache.org/jira/browse/TIKA-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Kincaid updated TIKA-2410:
-------------------------------
    Attachment: sample-rtf.rtf

This RTF file exhibits the behavior described in the bug

> RTF parser is tagging non-bold text as bold
> -------------------------------------------
>
>                 Key: TIKA-2410
>                 URL: https://issues.apache.org/jira/browse/TIKA-2410
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Dave Kincaid
>         Attachments: sample-rtf.rtf
>
>
> While parsing some RTF files I'm finding that the RTF parser tags many text 
> spans as bold even if they are not. I am attaching a sample RTF file that 
> exhibits this behavior. When parsing the file the first line is correctly 
> tagged as bold. However the second line (the phone number) which is not 
> supposed to be bold is tagged as bold.
> The following code demonstrates the problem.
> {code:java}
>         InputStream inputStream = 
> Thread.currentThread().getContextClassLoader()
>                 .getResourceAsStream("sample-rtf.rtf");
>         Parser parser = new RTFParser();
>         ContentHandler contentHandler = new ToXMLContentHandler();
>         Metadata metadata = new Metadata();
>         ParseContext context = new ParseContext();
>         parser.parse(inputStream, contentHandler, metadata, context);
>         String xml = contentHandler.toString();
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to