Re: arachne-digest V1 #1166

cpc.off Mon, 26 Jun 2000 18:11:58 -0700
On 26 Jun 00 at 12:23, owner-arachne-digest@arachne wrote:

> 
> Date: Sat, 24 Jun 2000 17:22:17 +0000
> From: "Bastiaan Edelman" <[EMAIL PROTECTED]>
> Subject: Re: Viewing RTF and other stuff
> 
> > So what I have learned to live with is deformatting everything to
> > ASCII text first, then do the codepage conversion and finally
> > reformate the texts manually again. I cannot say that I am happy with
> > this. What about setting up a new World Championship of Converting
> > Formatted Documents while Preserving Cultural Diversity of Eastern
> > and Western Europe?
> 
> I made a conversion program (in BASIC) to speed up the conversion.
> First I convert everything to 8-bit ASCII... all the information is
> still in the text file but often not in the wanted caracters.
> Then I start the BASIC conversion program that has a list of caracters
> to convert into any other wanted caracter.

My point was not really the conversion problem. As I have never 
learned any serious programming language I am using Basic programs, 
too. It is a bit slow, but it works. In order to achieve really 
accurate translation of one 8-bit codepage to another 8-bit codepage 
I use 16-bit UNICODE standard as an intermediate. Translation tables 
for translations to and from UNICODE can be found at 

<ftp://ftp.unicode.org:21/Public/>

an explanation at 

<http://charts.unicode.org/>

What I nevertheless need to do manually then and what I do not like 
at all is the reformatting of documents. Even if I had HTML or RTF 
with perfect format (and logical structure!) before, after conversion 
procedure, I have to take the ASCII plain text and start to look 
for chapter headlines from the very beginning again. 

This is what I would like to improve. Let me try to summarize my 
experiences: 

Conversion from       to                method

Word for DOS          Word for DOS      evident file structure -
                      (only West        simple home-made 
                      European)         conversion utilities           
                              
Word for WIN          Word for DOS      only through ASCII with
West European                           VIEW.EXE or CatDoc 
                                        (not without problems)

Word for WIN          Word for DOS      only through ASCII with
East European                           CatDoc (not without problems)

RTF                   Word for DOS      format directly 
West European                           supported

RTF                   Word for DOS      ???
East European                           

HTML                  Word for DOS      through RTF by utilities
West European                           MARTHA/ISHTAR

HTML                  Word for DOS      through RTF ???
East European                           but DOX does not work (1)

RTF                   HTML              MARTA/ISHTAR, DOX, R2H
West European                           

RTF                   HTML              R2H utility (DOS version)
East European                           DOX (not without problems)

HTML                  RTF               MARTHA/ISHTAR
West European                           

HTML                  RTF               ???
East European                           DOX does not work (1)

Word for DOS          RTF               format directly supported
West European 

Word for DOS          RTF               ???
East European                           

Word for DOS          HTML              through RTF by utilities
West European                           MARTHA/ISHTAR

Word for DOS          HTML              through RTF by utilities 
East European                           R2H or DOX

Of course this table is not complete. According to HTML-formate it is 
of high significance, whether diacritical characters are is expressed 
in 8-bit code, so-called entities (e.g. &uumlaut; BTW: are they 
already standardized for Eastern Europe?) or finally in Code numbers 
(e.g. &#NNN;)

Word for DOS is my preferred editor, because it supports 
style sheets, RTF output and logical structuring of documents. But 
the reformatting problem ist more general, as I wanted to indicate  
mentioning HTML. This is I again post it here. 

(1) Annotation: I found a shareware program DOX

      <http://users.hunterlink.net.au/~mabatp/>

    Perhaps it can be helpful in some of those cases I indicated 
    by question marks. I cannot get it to run in the direction 
    from HTML to RTF. Is this a feature limited to registered 
    version? 

I am not expecting to find a freeware utility for anything ready on 
the web. But not being a programmer, I would be grateful for an 
explanation on the structure of those East European RTF files. 

> This works very nice... except for some caracters that are NOT in the
> WORD caracterset.
> Than WORD gives a large string with means: this caracter can be found in
> the special WORD codepage #x and you have get caracter #y on that
> codepage.
> An example is the 'omega' sign, much used in electronics to denote
> 'resistance' and also in the Greek alfabet.
> In code page 437 it is #234 but Micro$oft gives something like the
> string:"}{\f0\fs22 {\field{\*\fldinst SYMBOL 87 \\f "Symbol" \\s   
> 11}{\fldrslt\f3\fs22}}}{\f0\fs22"
I havenot got any idea, what this can be. It reminds me of the 
language of Word's printer driver editor...

Greetings Christof Lange - Prague
[EMAIL PROTECTED]
Re: arachne-digest V1 #1166

Reply via email to