https://bz.apache.org/bugzilla/show_bug.cgi?id=50955

Tim Allison <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #12 from Tim Allison <[email protected]> ---
r1790061

If anyone has a chance to review this before the next release, that'd be great.

The current heuristic looks for a non-default/symbol codepage in the font table
and then applies that. 

I was able to find only one file in ~1300 where this heuristic fails, and I'll
open a follow up issue for that.

The other item that I worked towards fixing is that we need special handling
for Big5. MS Word 6.0 stored e.g. 7C B7 in reverse order B7 7C, and it zero
padded ascii characters.  Even if we flip the bytes, new String(byte[], "Big5")
doesn't strip out the zero-padding in the ascii.

There remains the basic problem that TextPiece stores data in a StringBuilder,
and the actual conversion of bytes to chars isn't straight forward.

For example, if we assume that Big5 requires 2x the number of bytes, all is
well with storage, but then it contains the 0 padding, and our code assumes
that the StringBuilder contains an actual strings, not this zero-padded
stuff...so we'd have to strip those out.  From a storage perspective, and
"closer to MSWord" perspective, this is probably better.  If we count the
number of bytes read per # of chars, we get a mismatch.  There's no apparent
easy solution to this.

Finally, I couldn't find a way of linking runs or text pieces to fonts. In the
few files I found with multiple non-default encodings, the font encoding offset
for the FFn in the runs was always 0, even though the actual font used was not
0, if we go by the codepage info.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to