On Thu, Nov 26, 2015 at 7:49 PM, Branko Čibej <br...@apache.org> wrote:

> On 26.11.2015 22:55, William A Rowe Jr wrote:
> > On Nov 26, 2015 11:03 AM, "Branko Čibej" <br...@apache.org> wrote:
> >> On 26.11.2015 15:44, William A Rowe Jr wrote:
> >>> Better if I address this Q to svn folks at the APR project :)
> >>> On Nov 26, 2015 08:39, "William A Rowe Jr" <wr...@rowe-clan.net>
> wrote:
> >>>
> >>>> Sounds right... Actually a fusion between svn_cstring_* and several
> >>>> existing ap_ and apr_ functions would be useful.
> >>>>
> >>>> SVN folk, any objection to APR appropriating these API's?  20/20
> >>>> hindsight, is apr_cstring_ or shorter apr_cstr_ the way to go here?
> > You
> >>>> all had to use the thing so I trust your preferences.  Either
> expresses
> >>>> locale C in my mind, so they work for me.
> >> Note that the svn_cstring* functions have *nothing* whatsoever to do
> >> with the "C" locale; they manipulate nul-terminated "C" strings, that's
> > all.
> >> svn_cstring_casecmp depends on svn_ctype_casecmp; the svn_ctype
> >> functions are expected to only work on the ASCII subset.
> >>
> >> -- Brane
> > Understood.
> >
> > Unlike svn we still support EBCDIC and so the use of the phrase 'ASCII'
> is
> > unnecessary confusing.
> >
> > The aliases C and POSIX both refer to the locale you describe.  Only
> ASCII
> > digits are recognised, only ASCII punctuation is honored, only ASCII
> alpha
> > are case-folded.
> >
> > Or the associated characters in the EBCDIC set.  All other byte values
> are
> > opaque.
> >
> > GCC deemed this important enough to add the g_ascii_str* gcc specific
> > extension functions.
> >
> > We are saying the same thing and reading, just using different semantics
> to
> > describe cstring.
>
> Well, not exactly; the svn_cstring_casecmp is the only function in that
> group that works as if it were always in the "C" locale. The others are
> are simply a convenience for managing variable-length nul-terminated
> strings. In Subversion, for example, their contents are usually encoded
> in UTF-8.
>

To clarify, in the "C" locale, utf-8 is just fine.  Do the other functions
treat
the opaque utf-8 high-bit-set characters specifically, or do they simply
treat them as individual distinct bytes?

If it is the later, that conforms to the C/POSIX locale, but if they are
actually
handled explicitly as utf-8 sequences, that gets a little more tricky.


> ASCII vs. EBCDIC (or any other single-byte encoding) is really only a
> matter of using different case folding and codepoint attribute tables
> (or equivalents; there's no reason the implementations have to be
> table-driven). More complex encodings are pretty much out of scope, IMO.
>

And we agree from an httpd perspective.  The issue is that we must handle
all ASCII (RFC-defined) sequences specifically and have no side-effects
that we weren't expecting, from a hardening perspective.

In any case — I don't think anyone over at dev@s.a.o would object to APR
> including those functions. We actually have a number of other, heh,
> improvements on APR that we could "donate"; we just never really got
> around to producing the necessary patches.
>

I hope as we start discussing 2.0 in more detail, that some of these come
through.  But I'm inclined not to wait and to begin forking this specific
API
as something that httpd 2.next needs, and some future version of 2.4.x may
decide it must adopt.

Thanks for the insights,

Bill

Reply via email to