On 23 Apr 2011, at 4:19 PM, Fridrich Strba wrote: > Edward, > > > On Sat, 2011-04-23 at 11:13 -0400, Edward Mendelson wrote: >> Also potentially useful: The character map document that shipped with 6.x >> for DOS is here: >> http://dl.dropbox.com/u/271144/CHARACT6.DOC >> It includes all 14 sets. > > Thanks for this one. I run it over wpd2text and it might be that we > actually do a good job here. Now, I could use some help here. If you > people could simply do the wpd2text of those characters and see whether > all glyphs in all charsets are correctly mapped. If you find error, just > note which charset and char number and what would be the correct unicode > mapping. For characters that would correspond to 2 or more unicode > character sequence, please write that down, but give me also a closer > approximation of 1-1 mapping. > > >> I'll find and post the CHARMAP.TST that shipped with 5.1 Hebrew and Arabic >> later on. > > The wp2rtf zip file contains a TEST.WP file that has all the charsets in > it. If you have the visual representation, just run it through wpd2text > and compare. I would appreciate again to have the information of wrongly > mapped glyphs. >
Fridrich, Both these files (CHARACT6.DOC and TEST.WP - which is the same as the WP51 CHARACTR.DOC) produced very good results when run through wpd2html. Here are some quick notes: 1. Quite a few WP characters have no unicode equivalents, and there is no way to fix that. 2. In TEST.WP (the WP5.1 file), 6,56 through 6,234 didn't convert at all; but these characters are correctly converted in the WP6.x CHARACT6.DOC. You evidently have different tables for 5.x and 6+, and I think you can simply copy the 6,56 through 6,234 mappings from the 6+ table to the 5.x table. 3. In the converted CHARACT6.DOC, I think it may be possible to add these: 2,44 seems to be 0361 2,45 seems to be 035C 4,100 seems to be 1D11E 4,101 seems to be 1D122 6,83 seems to be 2A38 9,83 seems to be 05AA I'll have to check the Hebrew tomorrow, but since I don't know any Hebrew, I'll be guessing. Smokey, I think you know Arabic. Is there any chance you could check these? I finally tested the Arabic WP 5.1 files from this page: http://www.un.org/popin/unpopcom/32ndsess/gass.htm wpd2odt says they are not WordPerfect files. Apparently libwpd doesn't handle documents created by Arabic WP5.1 or Hebrew WP5.1. I hope these details help somewhat, and will try to report more tomorrow. Edward Mendelson Contributing Editor PC Magazine ------------------------------------------------------------------------------ Fulfilling the Lean Software Promise Lean software platforms are now widely adopted and the benefits have been demonstrated beyond question. Learn why your peers are replacing JEE containers with lightweight application servers - and what you can gain from the move. http://p.sf.net/sfu/vmware-sfemails _______________________________________________ Libwpd-devel mailing list Libwpd-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libwpd-devel