[
https://issues.apache.org/jira/browse/TIKA-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dave Kincaid updated TIKA-2410:
-------------------------------
Attachment: sample-rtf.rtf
This RTF file exhibits the behavior described in the bug
> RTF parser is tagging non-bold text as bold
> -------------------------------------------
>
> Key: TIKA-2410
> URL: https://issues.apache.org/jira/browse/TIKA-2410
> Project: Tika
> Issue Type: Bug
> Reporter: Dave Kincaid
> Attachments: sample-rtf.rtf
>
>
> While parsing some RTF files I'm finding that the RTF parser tags many text
> spans as bold even if they are not. I am attaching a sample RTF file that
> exhibits this behavior. When parsing the file the first line is correctly
> tagged as bold. However the second line (the phone number) which is not
> supposed to be bold is tagged as bold.
> The following code demonstrates the problem.
> {code:java}
> InputStream inputStream =
> Thread.currentThread().getContextClassLoader()
> .getResourceAsStream("sample-rtf.rtf");
> Parser parser = new RTFParser();
> ContentHandler contentHandler = new ToXMLContentHandler();
> Metadata metadata = new Metadata();
> ParseContext context = new ParseContext();
> parser.parse(inputStream, contentHandler, metadata, context);
> String xml = contentHandler.toString();
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)