Yesterday at 13:07, praveen kumar sivapuram wrote:

> I am developing an application, which needs to know the format of
> the file name. Based on the documents referenced on the web, i
> understood that Linux file system does not impose any specific
> format for the file system. Users can create files using either
> UTF-8 or ISO-XXXXX-X.  

And using even any of the other hundred of encodings compatible with
ASCII (FWIW, you can probably use some non-ASCII-compatible encodings
as well, but you may run into some problems with some special
characters).

> Here are my questions:
>  
> 1. Is there any way to identify whether the filename is UTF8 or non-UTF8?

If entire file name is a valid UTF-8 string, then it's likely to be
UTF-8 (it's possible that it's not really UTF-8, but very unlikely!).
To check if a string is valid UTF-8, just consult UTF-8 RFC.

> 2. Is there any known application which still uses ISO-8859XXX codesets for 
> creating file names?

Many old (and new?) applications use current character set on the
system (set through eg. LC_CTYPE, or other LC_* variables).  I'd
suggest all new applications to use UTF-8.

> 3. Can i safely assume that all files on a particualr syatem will be in the 
> same format? i.e all file names will be either UTF8 or any other ISO codeset?

No.  Different users might be running different locales, and those
mentioned "old" applications might assume filenames to be in users'
locale encodings.  Of course, if some user switches locales often,
then all kinds of mess-ups might occur, unless she's consistently
using UTF-8 (or other language-agnostic encoding) for naming files.


In practice, for most single-user systems, you can make such an
assumption.


Cheers,
Danilo

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to