On Wed, Dec 31, 2003 at 04:38:02PM +0200, Chaim Keren Tzion wrote:
> I tried 'convmv' and it worked great. One point to be aware of: I had two 
> directory structures and it seems that one was in cp862 and one was in iso-
> 8859-8. At first I ran the same command on both directories:
> convmv -r -f cp862 -t utf8 --nfc directory1
> That worked fine for the cp862 encoded directory but it messed up the iso-8859-

It means it's not that smart. I suggest you report it to the author.
IMO, it should have done nothing.

> 8 one. Good thing I backed them up first. I then had to play around to figure 
> out what encoding the second directory was in and then ran:
> convmv -r -f iso-8859-8 -t utf8 --nfc --notest directory2
> It worked fine and I am now all UTF8.
> 
> One question though; Is there a way to query what encoding a file or 
> directory's name is in? I had to just keep trying different 'from' encodings 
> until it worked.

There is no way to "query" it - it's not written anywhere. The only thing
you can do is _guess_ it. If you know it's hebrew, there are only a few
possibilities. You can simply do 'ls --show-control-chars | od -tx1'
and see the raw data - cp862 starts at hex 80 and iso8859-8 starts at
hex E0. If you don't know the language, you need a smarter tool. I think
one such tool is mguesser, but it has no maps for cp862 so I didn't try
it (but it's probably trivial to convert its iso8859-8 map to cp862).
It also needs a large amount of data to work on - I guess it compares
distributions of letters to known languages' distributions.
-- 
Didi


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to