Brane, if that's sufficient, would you propose a rewording patch to dispel
any confusion about how the API behaves?

I understand where you confusion arose, but UCS-2<>UTF-16 was a thing
already when I wrote the code. What I didn't reflect in comments is that we
will throw away bad UTF-16 as quickly as we dispel UTF-8, even as Java
kept eating it up for some very serious consequences (yes, I was the first
reporter, so far as I know.)



On Wed, Jun 12, 2019 at 8:15 PM Branko Čibej <br...@apache.org> wrote:

> On 12.06.2019 16:47, William A Rowe Jr wrote:
> > On Tue, Jun 11, 2019 at 5:38 PM William A Rowe Jr <wr...@rowe-clan.net
> > <mailto:wr...@rowe-clan.net>> wrote:
> >
> >     On Tue, Jun 11, 2019 at 1:44 PM Branko Čibej <br...@apache.org
> >     <mailto:br...@apache.org>> wrote:
> >
> >>          We either reserve about 2x buffers for file name
> >>         transliteration in heap
> >>         per thread, or we use the thread stack. As long as we trust
> >>         that our utf-8
> >>         to ucs-2 logic is rock solid and the allocations and limits
> >>         are correctly
> >>         coded, this continues to be a safe approach.
> >
> >         Apropos of that, for 2.0 we're about to or have already
> >         ditched support for versions of Windows that do not have
> >         native UTF-8/UTF-16 conversions (ah, yes ... Windows has
> >         finally moved from UCS-2 to UTF-16). Wouldn't this be the
> >         right time to switch to using Windows' functions instead of
> >         staying with our own? Especially since, with the transition to
> >         UTF-16, we have to deal correctly with surrogate pairs,
> >         something our current code (IIRC) doesn't do.
> >
> >
> >     A bit of a misnomer, the code is full of references to ucs-2
> >     w/surrogate pair
> >     support, the combo of these is utf-16. The comments can be
> >     refreshed to
> >     today's utf-16 nomenclature.
> >
> >
> > How strongly do we feel about the naming? I have the following patch to
> > commit if we want to observe current convention in utf-naming and
> > deprecate
> > many (but not all) ucs references.
>
> As far as I'm concerned, the naming (which is an internal matter, not
> part of the API) can stay as long as the implementation is correct. A
> comment to the tune of "it's actually UTF-16 but started life in the
> olden days when only UCS-2 existed" is good enough.
>
> -- Brane
>
>

Reply via email to