MsPowerPointTextExtractor does not extract from PPTs with € sign
----------------------------------------------------------------

                 Key: JCR-1530
                 URL: https://issues.apache.org/jira/browse/JCR-1530
             Project: Jackrabbit
          Issue Type: Bug
          Components: jackrabbit-text-extractors
    Affects Versions: 1.4
            Reporter: Dirk Feufel


The MsPowerPointTextExtractor class has a problem when reading PPTs when an € 
sign is contained. All text following that sign is ignored. Perhaps the POI 
PowerPointExtractor should be used instead of parsing the data by hand. As a 
side effect, this would simply the code. Extracting could be done as follows:

        public Reader extractText(InputStream stream, String type, String 
encoding) throws IOException {
                try {
                        PowerPointExtractor extractor = new 
PowerPointExtractor(stream);
                        return new StringReader(extractor.getText(true,true));
                } catch (RuntimeException e) {
                        logger.warn("Failed to extract PowerPoint text 
content", e);
                        return new StringReader("");
                } finally {
                        try { stream.close(); } catch (IOException ignored) {}
                }
        }


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to