Brane, if that's sufficient, would you propose a rewording patch to dispel any confusion about how the API behaves?
I understand where you confusion arose, but UCS-2<>UTF-16 was a thing already when I wrote the code. What I didn't reflect in comments is that we will throw away bad UTF-16 as quickly as we dispel UTF-8, even as Java kept eating it up for some very serious consequences (yes, I was the first reporter, so far as I know.) On Wed, Jun 12, 2019 at 8:15 PM Branko Čibej <br...@apache.org> wrote: > On 12.06.2019 16:47, William A Rowe Jr wrote: > > On Tue, Jun 11, 2019 at 5:38 PM William A Rowe Jr <wr...@rowe-clan.net > > <mailto:wr...@rowe-clan.net>> wrote: > > > > On Tue, Jun 11, 2019 at 1:44 PM Branko Čibej <br...@apache.org > > <mailto:br...@apache.org>> wrote: > > > >> We either reserve about 2x buffers for file name > >> transliteration in heap > >> per thread, or we use the thread stack. As long as we trust > >> that our utf-8 > >> to ucs-2 logic is rock solid and the allocations and limits > >> are correctly > >> coded, this continues to be a safe approach. > > > > Apropos of that, for 2.0 we're about to or have already > > ditched support for versions of Windows that do not have > > native UTF-8/UTF-16 conversions (ah, yes ... Windows has > > finally moved from UCS-2 to UTF-16). Wouldn't this be the > > right time to switch to using Windows' functions instead of > > staying with our own? Especially since, with the transition to > > UTF-16, we have to deal correctly with surrogate pairs, > > something our current code (IIRC) doesn't do. > > > > > > A bit of a misnomer, the code is full of references to ucs-2 > > w/surrogate pair > > support, the combo of these is utf-16. The comments can be > > refreshed to > > today's utf-16 nomenclature. > > > > > > How strongly do we feel about the naming? I have the following patch to > > commit if we want to observe current convention in utf-naming and > > deprecate > > many (but not all) ucs references. > > As far as I'm concerned, the naming (which is an internal matter, not > part of the API) can stay as long as the implementation is correct. A > comment to the tune of "it's actually UTF-16 but started life in the > olden days when only UCS-2 existed" is good enough. > > -- Brane > >