MsPowerPointTextExtractor does not extract from PPTs with € sign
----------------------------------------------------------------
Key: JCR-1530
URL: https://issues.apache.org/jira/browse/JCR-1530
Project: Jackrabbit
Issue Type: Bug
Components: jackrabbit-text-extractors
Affects Versions: 1.4
Reporter: Dirk Feufel
The MsPowerPointTextExtractor class has a problem when reading PPTs when an €
sign is contained. All text following that sign is ignored. Perhaps the POI
PowerPointExtractor should be used instead of parsing the data by hand. As a
side effect, this would simply the code. Extracting could be done as follows:
public Reader extractText(InputStream stream, String type, String
encoding) throws IOException {
try {
PowerPointExtractor extractor = new
PowerPointExtractor(stream);
return new StringReader(extractor.getText(true,true));
} catch (RuntimeException e) {
logger.warn("Failed to extract PowerPoint text
content", e);
return new StringReader("");
} finally {
try { stream.close(); } catch (IOException ignored) {}
}
}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.