From: "Luke Kenneth Casson Leighton" <[EMAIL PROTECTED]> Sent: Wednesday, June 13, 2001 7:17 AM
> On Tue, Jun 12, 2001 at 11:46:30AM -0500, William A. Rowe, Jr. wrote: > > From: "Luke Kenneth Casson Leighton" <[EMAIL PROTECTED]> > > Sent: Tuesday, June 12, 2001 10:22 AM > > > > > how would the idea of having an apr_ucs16 set of routines, > > > apr_wstrcat, apr_wstrcpy, apr_wtolower, apr_wtoupper etc., > > > be received? > > > > Well, since apr_isfoo apr_tofoo was 'reinvented', I don't see a > > huge problem. > > cool. But please take a look first at the dialog that's started under iconv, this is a one way ticket to solving one specific problem. If we implement under apr_iconv, we can accomplish a lot more. mod_autoindex could get exactly 20 characters of description, even when these are 20 bytes, 33 bytes or 40 bytes. > > > on nt, it's easy: straightforward usage of the NT > > > wstrcat, wstrcpy etc. lines. > > > > These are the folks who never read the "Security Implications" of ucs-8 > > leaving 40% of all IIS webservers still vulnerable, so I'm dubious :-) > > *grin*. > > btw, samba #defines strcpy to ERROR_USE_SAFE_STRCPY_INSTEAD etc. > > sorry, forgot about this. okay, rewrite that: how > about an equivalent apr_pwstrcat, apr_pwstrcpy with all > the safety / security / paranoia therein? Again, why we shouldn't 'do' simply a Unicode wrapper that is inferior. > > Well, how about a simple question. Why restrain ourselves to ucs2? > > because it's what NT has: NT doesn't have 32-bit (ucs4?) unicode, afaik, > only 16-bit (ucs2?) Ok, NT uses 32 bit unicode, later 2000 releases add the double-word pairs. But why are you exposing for WinNT? Here's the kick, apr is a byte oriented interface to the OS. It will never be otherwise. When I say byte oriented, I mean any internationalization needs to use something simple and transparent, such as utf-8. That's what we are doing, right now. If you want to extend unicode treatment internally as accessors (which I did with the fast and safe utf8/ucs2 conversion) then I'm all for it, if it helps us. But those are internals. The rest of the world is still byte oriented. This is a compatibility layer, so we need to focus apr in that direction. > > Can iconv/apr_iconv provide this in a charset-opaque manner? That is, if > > I want three 'characters' in shift-jis, can it give me the right number > > of bytes? The reason is simple, Unicode is already splintered into a > > multi-word character set anyways. I suspect it's easier to just get it > > right, knowing the apr_xlate that's been opened, and asking for the char > > len v.s. the byte len (sizeof) and providing the strcpy/cmp, etc. > > you need to be able to wtoupper, wtolower etc. that requires > a lookup table. samba has an optimised lookup table of the > standard ucs2 upper/lower conversion tables that is small enough > to fit into the 2nd-level cache of an intel processor. Then let's not start adding things willy nilly. We have apr_iconv due to portability, let's build upon that. It should be across character sets, so we can handle this stuff in an opaque manner. Bill
