Just a quick reply (it's bedtime over here): there may be 2 problems. 1
is that the mail program put in an unwanted linebreak after the =~
part, just remove it; it should all be one line. And then: you'll need a
fairly recent version of perl for it to work, what do you get when you
do
perl --version
I guess for utf to work, it should be at least 5.8.0. Your basic idea of
the usage is right (I'm not a windows person, but I assume it should be
the same): save the scipt as utf2tex.pl, make it executable and call it
as utf2tex.pl FILENAME.txt.
I guess it would be easiest to convert the utf to ascii directly - that
would mean you could later convert it back. I have a set of scripts that
do just that -- convert babel Greek into utf-8 and back.
If you need more help, I'll look into it tomorrow!
Best
Thomas
On Sat, 2004-06-05 at 23:33, Idris Samawi Hamid wrote:
> On Sat, 05 Jun 2004 22:41:39 +0200, Thomas A. Schmitz
> <[EMAIL PROTECTED]> wrote:
>
> > Idris,
> >
> > I know a bit of perl and would love to help. However, I fear that
> > sending us your stuff via mail will be a bit difficult because the utf-8
> > chracters get transformed into gibberish.
>
> Thnx 4 such a speedy reply! I don't think you are getting gibberish
> though; you should be getting the extended ascii representation. So the
> letter alif (hex 0627) should look like this:
>
> ÃÂ
>
> Do you get a forward-slashed circle and a section symbol? If so, that's
> the ascii representation I'm trying to convert to the letter `A'.
>
> Here are the codes you want:
>
> ÃÂ [0627] => A
>
> ÃÂ [0628] => b
>
> ÃÂ [062C] => j
>
> ÃÂ [062F] => d
>
> Ãâ [0647] => h
>
> ÃË [0648] => w
>
> ÃÂ [0632] => z
>
> Let me explain my situation more clearly:-)
>
> I have a unicode editor, Unitype Global Writer. I save a unicode document
> as a utf *.txt file. When I open that saved file in my TeX editor
> (WinEdt), it comes out as extended ascii (that's the "gibberish"). So what
> I wanted to do was convert the ascii "gibberish" to my Latin
> transcription. It seems that what you are suggesting is to use the hex
> representation and convert the unicode txt file into a Latin transcription
> file directly and bypass the gibberish.
>
> On your perl file, can you give me an example of how to use it? I tried
> (in windows, with name
> utf2tex.pl and unicode text in unicode-utf.txt) and get
>
> =========================
> > perl utf2tex.pl unicode-utf.txt
> Unknown discipline class ':utf8' at C:/Perl/lib/open.pm line 18.
> BEGIN failed--compilation aborted at utf2tex.pl line 4.
> =========================
>
> from your script I tried, e.g.
>
> ============================
> $_ =~
> s/\x{0627}/\x{0041}/esg;
> # from alif to `A'
> ============================
>
> Your guidance will be greatly appreciated!
>
> Thnx a million!
> Idris
_______________________________________________
ntg-context mailing list
[EMAIL PROTECTED]
http://www.ntg.nl/mailman/listinfo/ntg-context