On Tuesday 07 September 2004 12:56, Yedidyah Bar-David wrote:
> On Tue, Sep 07, 2004 at 10:39:08AM +0300, Amir Hardon wrote:
> [snip]
>
> > Yes, I realize that I can't find the encoding, but I don't really need to
> > do it.
> > I only need to have the answer for 'can it be iso8859-1?',
>
> So why not simply do
> iconv -f iso8859-1 -t utf-8 < file > /dev/null
> and see if there was an error (by checking stderr or the return value)?
>
> > if the answer is yes then nothing bad can happen from the convertion
> > (Correct me if I'm wrong).
>
> I think you are wrong. Suppose a certain filename is both a legal
You are right...)-:
> iso8859-1 string and a legal iso8859-8 string. What would you do?
> If you convert from iso8859-1 to cp850 you'll get something different
> than if you convert it from iso8859-8 to cp862. So what would you do?
> I did not look at enca, but if it makes a serious attempt, it uses
> a dictionary.
>
> > enca doesn't look standard (It's not even in the debian tree).
>
> I agree. Your needs are probably also not very standard.
>
> > I'm sure there's a way to implement this test with the standard tools...
>
> The test, yes. Finding the encoding/language - tough one.

Since there is no sure way to verify the encoding before the convertion, I 
choose to convert without verification.

The script is available at http://amir.hardon.co.il/scripts/unzipconv.sh for 
any one that is interested.
Just make sure to use it only on files extracted with unzip, and only once. It 
may do nasty things to your filenames if you make it convert other stuff.
If you use Unicode then you will need to change the encoding in the script.

�-Amir.

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to