On Tuesday 07 September 2004 07:58, you wrote: > On Tue, Sep 07, 2004 at 02:41:02AM +0300, Amir Hardon wrote: > > After Didi gave me the solution for the encoding problem with unzip I > > decided to write a script for converting the filenames (Maybe I'll patch > > unzip in the future but that's my temporary solution). > > > > The script I wrote just change the filenames and directories names using > > iconv, but if it tries to handle an already iso8859-8 encoded filename > > iconv gives an error (The validity check fails, it expects the input > > string to be encoded as iso8859-1 but it is not). > > Is there any standard unix tool I can use to first check the string's > > encoding and then decide whether to convert it or not? > > You do realize this is pure guesswork, do you? Noone can tell you the > correct answer for sure. That said, a tool that tries to do this is > called 'enca'. I never tried it myself, though. > > If all you want is to know if a given string is iso8859-1 or 8859-8 > than I would personally simply do a range check - Hebrew should have > bytes below 127 or between 224 and 250. Note!!! That this isn't > accurate, and I'll never do this on a large mission critical tree > without a good backup. > > > Thanks, > > -Amir.
Yes, I realize that I can't find the encoding, but I don't really need to do it. I only need to have the answer for 'can it be iso8859-1?', if the answer is yes then nothing bad can happen from the convertion (Correct me if I'm wrong). enca doesn't look standard (It's not even in the debian tree). I'm sure there's a way to implement this test with the standard tools... ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
