[
https://issues.apache.org/jira/browse/TIKA-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann updated TIKA-1067:
------------------------------------
Component/s: parser
> Tika extracts non-existent asterisks (*) from .ppt files
> --------------------------------------------------------
>
> Key: TIKA-1067
> URL: https://issues.apache.org/jira/browse/TIKA-1067
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Michael McCandless
>
> I created a new blank presentation, put in title + subtitle, saved it as
> .ppt, and then ran TikaCLI -t:
> {noformat}
> <body><div class="slideShow"><div class="slide"><p
> class="slide-master-content">*<br/>
> *<br/>
> </p>
> <p class="slide-content">Testing<br/>
> testing<br/>
> </p>
> </div>
> </div>
> <div class="slideNotes"/>
> {noformat}
> The two extra *'s seem to be coming from the master slide, but I'm not sure
> which text runs they are and how to stop them ...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira