https://issues.apache.org/bugzilla/show_bug.cgi?id=50955
Summary: An error occurred while retrieving the text file.
Product: POI
Version: 3.8-dev
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: HWPF
AssignedTo: [email protected]
ReportedBy: [email protected]
When attempt to extract text from a file error output:
java.lang.IllegalStateException: Told we're for characters 0 -> 173225, but
actually covers 173211 characters!
at org.apache.poi.hwpf.model.TextPiece.<init>(TextPiece.java:50)
at org.apache.poi.hwpf.model.TextPieceTable.<init>(TextPieceTable.java:95)
at
org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:54)
at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:68)
at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:42)
Here's the source code, which I am trying to extract text from a file:
public Boolean parseFile(String pathToFile) {
InputStream isr = null;
try {
isr = new FileInputStream(pathToFile);
WordExtractor word = new WordExtractor(isr);
String fileContent = "";
String[] paragraphes = word.getParagraphText();
for (String paragraph : paragraphes) {
fileContent += " " + paragraph;
}
AddDataToIndex.class.newInstance().doAddData(fileContent,
pathToFile);
return true;
} catch (OldWordFileFormatException ex) {
return parseWord6(pathToFile);
} catch (Exception ex) {
Vars.logger.fatal(ex);
return false;
} finally {
try {
isr.close();
} catch (IOException ex) {
Vars.logger.fatal(ex);
}
}
}
private Boolean parseWord6(String pathToFile) {
FileInputStream fis = null;
try {
File docFile = new File(pathToFile);
fis = new FileInputStream(docFile.getAbsolutePath());
POIFSFileSystem pfs = new POIFSFileSystem(fis);
HWPFOldDocument doc = new HWPFOldDocument(pfs);
Word6Extractor docExtractor = new Word6Extractor(doc);
return true;
} catch (Exception ex) {
Vars.logger.fatal("Error: ", ex);
return false;
} finally {
try {
fis.close();
} catch (IOException ex) {
Vars.logger.fatal("Error", ex);
}
}
}
File, which I tried to parse - attached.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]