https://issues.apache.org/bugzilla/show_bug.cgi?id=50955

           Summary: An error occurred while retrieving the text file.
           Product: POI
           Version: 3.8-dev
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HWPF
        AssignedTo: [email protected]
        ReportedBy: [email protected]


When attempt to extract text from a file error output:

java.lang.IllegalStateException: Told we're for characters 0 -> 173225, but
actually covers 173211 characters!
    at org.apache.poi.hwpf.model.TextPiece.<init>(TextPiece.java:50)
    at org.apache.poi.hwpf.model.TextPieceTable.<init>(TextPieceTable.java:95)
    at
org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:54)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:68)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:42)

Here's the source code, which I am trying to extract text from a file:

public Boolean parseFile(String pathToFile) {
        InputStream isr = null;
        try {
            isr = new FileInputStream(pathToFile);
            WordExtractor word = new WordExtractor(isr);
            String fileContent = "";
            String[] paragraphes = word.getParagraphText();
            for (String paragraph : paragraphes) {
                fileContent += " " + paragraph;
            }
            AddDataToIndex.class.newInstance().doAddData(fileContent,
pathToFile);
            return true;
        } catch (OldWordFileFormatException ex) {
            return parseWord6(pathToFile);

        } catch (Exception ex) {
            Vars.logger.fatal(ex);
            return false;
        } finally {
            try {
                isr.close();
            } catch (IOException ex) {
                Vars.logger.fatal(ex);
            }
        }
    }

    private Boolean parseWord6(String pathToFile) {
        FileInputStream fis = null;
        try {
            File docFile = new File(pathToFile);
            fis = new FileInputStream(docFile.getAbsolutePath());
            POIFSFileSystem pfs = new POIFSFileSystem(fis);
            HWPFOldDocument doc = new HWPFOldDocument(pfs);
            Word6Extractor docExtractor = new Word6Extractor(doc);
            return true;
        } catch (Exception ex) {
            Vars.logger.fatal("Error: ", ex);
            return false;
        } finally {
            try {
                fis.close();
            } catch (IOException ex) {
                Vars.logger.fatal("Error", ex);
            }
        }
    }

File, which I tried to parse - attached.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to