John Chambers wrote:
> Lest you think this is way off topic, I might mention that I've  been
> involved  in  attempts to use non-ASCII char sets in my ABC tunes.  I
> have a lot of "international folk  dance"  tunes,  and  it  would  be
> really nice to be able to spell the titles right. Also, I like to use
> single-tune files as my  primary  data  (with  little  programs  that
> combine them for pages of tunes). It's really handy if the tune title
> can be used in the file name.  I've done this on my linux box, and at
> least Latin-1 names work there.  But when I rsync a directory over to
> my Mac Powerbook, it goes berserk on the files with non-ASCII letters
> in the names.
> 
> This tells me that OSX "isn't ready for prime  time"  in  the  coming
> international world. If it can't even handle a simple 'ä' or 'ö' in a
> file name, how is it ever going to handle Chinese  or  Japanese  file
> names?  It can't even handle a Finnish or Arabic file name. You can't
> expect those people to use English file names.  (Well, the  Finns  do
> all speak English these days, but still ...  ;-)

Snip...
 
> One question for our Scandinavian friends: Do any of  you  use  Macs?
> Can  you  get  filenames  that  contain the non-ASCII letters in your
> alphabet? If so, how do you make it work right? I've tried setting my
> charsets  to  8859-1  and  UTF-8 and others, and none of them seem to
> make the files in my .../Scand/  directory  copy  correctly  from  my
> linux  box.  Copying between linux to this FreeBSD system works fine,
> because those systems treat a character as unanalyzed bits.  But when
> copying to OSX, those files end up with gibberish names.

Mac OS X has full support for Unicode, although not all the BSD UNIX
utilities which have been ported over support Unicode to it's fullest
extent, so there are oddities when you use the command line.  That said, of
all the systems I've ever programmed on, the Mac has the best international
support of any of them -- internationalization has been a strong point for
Macs since the early days.

Apple's HFS+ file system (the Mac OS X default file system) stores filenames
in UTF-16 Unicode format.  This means I can (and do) have files with names
in just about any language using just about any characters from anywhere in
the Unicode code set.  (Including mixing and matching entirely different
language sets).  What happens to those when transferred to a Linux system or
a Windows system, who knows.

The problem you're seeing is not that the Mac doesn't support
internationalization, it's that it doesn't have any way of telling what the
encoding is for the filenames you're giving it.  Most filesystems out there
(with a couple exceptions, like HFS+ and NTFS, which store filenames in
UTF-16) encode filenames in some 8 bit string.  To get international
filenames, they use either different charsets, or UTF-8.  But there's
*nothing* in the filesystem itself which says "this filename is encoded in
format XXXX".  That information is stored in the OS application layer as a
*display* parameter.  So it all looks correct on that system, because the OS
translates it into the right characters when they get displayed.

But when you try to transfer it to another system, all it knows is that the
file is named some weird 8 bit string.  This is why it gets all mangled in
the translation.  It's even worse when you send it via email, because you
have to hope the email programs on both sides know how to deal with the
encodings you are sending.  I suspect rsync is the culprit in your case -- I
seriously doubt that's been made Unicode aware.

Probably your safest bet for a valid transfer is to burn the files to a CD
using ISO-9660 format.  There *is* a standard for filenames stored like
this, that most systems ought to be able to read.

It also should be possible to write a simple Demangle application which
would read in a filename (or a directory of filenames), and given an
encoding specified by the user, would translate it to Unicode and rename the
file appropriately.  Shouldn't be too complicated to write -- the standard
OSX string routines have all kinds of support for translating strings
between various encodings.

-->Steve Bennett

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to