https://bz.apache.org/bugzilla/show_bug.cgi?id=60953
Bug ID: 60953
Summary: Improve Big5 handling for Word 6.0
Product: POI
Version: 3.16-dev
Hardware: PC
Status: NEW
Severity: enhancement
Priority: P2
Component: HWPF
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Created attachment 34898
--> https://bz.apache.org/bugzilla/attachment.cgi?id=34898&action=edit
Example bilingual English/Chinese Big5 Word 6.0 file
While working on Bug 50955, I found that MS had their own encoding of Big5,
which included zero padding for ascii characters.
I included some code that ought to be cleaned up.
An example of Big5 used to encode English is already in our set: Bug51944.doc.
Some notes will follow.
I'm also attaching a better bilingual Big5 English/Chinese example from Apache
Tika's Common Crawl corpus.
Many thanks, again, to Common Crawl, Dominik Stadler and Rackspace.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]