https://issues.apache.org/bugzilla/show_bug.cgi?id=47496
Summary: Strange MS Word file reading behavior
Product: POI
Version: 3.5-dev
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: critical
Priority: P1
Component: HWPF
AssignedTo: [email protected]
ReportedBy: [email protected]
CC: [email protected]
There is very strange behavior. There are some doc-files in russian encoding
which contain particularly only tables. When reading them, HWPF returns a half
of document in the correct representation (each logical element in a
appropriate paragraph... character run...), and the second half of document is
represented in one paragraph and in the one character run. One of the effects
of this behavior is incorrect work of the TableIterator which returns only one
half of the all document's tables.
The debugging shows, that there are some strange breakthroughs in start<->end
values, when reading Plex of CPs. Here are printout of debug info (derived from
manually injected code lines in recompiled PlexOfCps class):
1. Creating TextPieceTable (in ComplexFileTable analyzing):
-----------------------------
start = 16474 size=448 sizeOfStruct=8
-----------------------------
Start -> 0 to end <-256
Start -> 256 to end <-1280
Start -> 1280 to end <-2048
Start -> 2048 to end <-3072
Start -> 3072 to end <-3840
Start -> 3840 to end <-4864
...
Start -> 25856 to end <-26368
Start -> 26368 to end <-27136
Start -> 27136 to end <-27648
Start -> 27648 to end <-28928
Start -> 28928 to end <-29184
Start -> 29184 to end <-58063 <--- !!! HERE !!!
2. Creating PAPBinTable:
-----------------------------
start = 7117 size=5020 sizeOfStruct=4
-----------------------------
Start -> 2048 to end <-2338
Start -> 2338 to end <-2546
Start -> 2546 to end <-2556
...
Start -> 59556 to end <-59694
Start -> 59694 to end <-59708
Start -> 59708 to end <-60402
Start -> 60402 to end <-264814 <--- !!! HERE !!!
Start -> 264814 to end <-264828
Start -> 264828 to end <-265600
Start -> 265600 to end <-265604
...
Start -> 320214 to end <-321000
Start -> 321000 to end <-321936
Start -> 321936 to end <-321950
Unfortunately, I can't attach this document files because of private
information containing in this files.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]