https://bz.apache.org/bugzilla/show_bug.cgi?id=60953

            Bug ID: 60953
           Summary: Improve Big5 handling for Word 6.0
           Product: POI
           Version: 3.16-dev
          Hardware: PC
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: HWPF
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Created attachment 34898
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34898&action=edit
Example bilingual English/Chinese Big5 Word 6.0 file

While working on Bug 50955, I found that MS had their own encoding of Big5,
which included zero padding for ascii characters.

I included some code that ought to be cleaned up.

An example of Big5 used to encode English is already in our set: Bug51944.doc.

Some notes will follow.

I'm also attaching a better bilingual Big5 English/Chinese example from Apache
Tika's Common Crawl corpus.

Many thanks, again, to Common Crawl, Dominik Stadler and Rackspace.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to