On Thursday 20 April 2006 12:24, you wrote:
> will this work using unicode on a bengali e-book
>
> ..peekay
Dear Peekay
Listening to you after a long time.
No, i don't think it will work like that. This works on pure text.
In fact after you sent me this mail, i tried it on a Unicode Bangla document,
b-odt that i am now writing with OOo 2.0, and saved it as a text file
b-oo.txt. Then ran the script on it. It is always giving a queer result: '1'.
I think this is a problem in interpreting the non-alphabet characters. In the
script 'tr' changes all such characters into newline, thus making the whole
text into a long list of words only.
I tried the first step of the script on the file:
tr -cs A-Za-z\' '\n' <b-oo.txt > b-oo-list.txt
And this list, when i opened with 'less' or 'cat' was showing nothing and the
octal dump ('od') gave only:
0000000 000012
0000001
So, 'tr' in the first step is sending actually nothing for the later steps to
work on.
This is outside my very scanty knowledge on text and its format. Sending it
both to you and our learned friends on LUG, if someone can help. If anybody
needs i can send the odt file and the text file.
Thank you for giving an interesting problem.
dipankar das
--
To unsubscribe, send mail to [EMAIL PROTECTED] with the body
"unsubscribe ilug-cal" and an empty subject line.
FAQ: http://www.ilug-cal.org/node.php?id=3