Re: unicode file APIs (was: Re: canonical stuff)

William A. Rowe, Jr. 24 Feb 2001 23:45:41 -0000

[Moved strictly to dev@apr.apache.org - since this seems to _not_ be a 
discussion
of apache, but primarily of an API for other APR users.]

From: "Greg Stein" <[EMAIL PROTECTED]>
Sent: Saturday, February 24, 2001 5:27 PM

> On Sat, Feb 24, 2001 at 11:31:49AM -0600, William A. Rowe, Jr. wrote:
> > From: "Greg Stein" <[EMAIL PROTECTED]>
> > Sent: Saturday, February 24, 2001 3:44 AM
> >...
> > > In a similar vein, when you added all that Unicode stuff, it just kind of
> > > dropped into the code. No big deal as it was all Win32 specific (i.e. it
> > > didn't affect my playground), but it was an awfully big change. Especially
> > > in the semantics. We still haven't refactored the API into two sets of
> > > functions (one for Unicode chars, one for 8-bit native).
> > 
> > I'm absolutely positively near certain we won't.  Please let me explain.
> >
> > ... lot of stuff about why Unicode filenames are Goodness ...
> 
> I don't disagree with wanting Unicode filenames. I completely disagree with
> APIs that change their semantics based on the platform they are compiled on.

And I'm _arguing_ that the semantics _do_ change, regardless of 
APR_HAS_UNICODE_FS.

Simply put - Win32 has a restricted set of characters.  Not only is it a 
restricted
set of characters, but alpha chars map from upper to lower case in very 
unpredictable
ways.  By unpredictable, I mean that the clib tolower()/toupper() _never_ 
matches the
mappings that the Win32 filesystem performs.  That's a very nasty side effect 
that
isn't really very tollerable.  Of course, we also eliminate a number of symbols 
on
Win32 that simply aren't supported, but are perfectly legal on Unix.

OTOH, spaces are not a problem, as they seem to be for Unix.

> If I have an application that I desire to be portable, then I'm going to use
> APR to do it. In my app, I call apr_file_open(some_8bit_name). That should
> work on all platforms. With the current single API, it will break on NT when
> compiled with the Unicode stuff.

What is portable here?  There is nothing portable about high-bit characters.
Other than opaque data, you can't make many assumptions about them without an
API that we haven't defined for APR.  Not that we shouldn't.  Not that it 
shouldn't
map the characters appropriately for _whatever_ code page the user desires.  But
it simply doesn't parse.  Local code pages are not effective for file naming, 
for
most applications, unless more information is known about the system.  We don't 
have
a way to provide that information.

> None of the APIs change their semantics. They exist or they don't, but they
> don't change.
> 
> The answer is to have apr_file_open_u() for opening with Unicode filenames,
> not changing the encoding of the existing apr_file_open. You completely
> break all possibility of writing portable apps when you do that. And APR is
> *about* writing portable apps.

What does apr_file_open_u() do on Unix?  I would expect, nothing.  Unless you 
have
a utf-8 build of unix (which there are) this is pretty meaningless.  But what 
_if_
the user is building apr under a utf-8 powered unix?  Is the filename 
Bite%x81Me.txt
accepted?  I can't answer the question.  What happens if it is accepted and 
created?
What does ls Bite* do?  That character alone is a continuation character with 
no lead
byte.  Does ls show anything worthwhile?

I'm saying stop even looking at Win32 for 10 minutes, and examine the bigger 
issues
that allow this to become a cross-platform API.  Then we can begin the process 
of
determining an _appropriate_ api to cover these issues.

There is nothing that says _any_ filesystem accepts high bit characters, except 
that
some do.  How can we relate this to the user and the coder?  I don't have an 
answer,
I simply believe that apr_functions_u() this anything but the common 
denominator.

Bill

Re: unicode file APIs (was: Re: canonical stuff)

Reply via email to