On Tue, Jun 11, 2019 at 1:44 PM Branko Čibej <br...@apache.org> wrote:

>  We either reserve about 2x buffers for file name transliteration in heap
> per thread, or we use the thread stack. As long as we trust that our utf-8
> to ucs-2 logic is rock solid and the allocations and limits are correctly
> coded, this continues to be a safe approach.
>
>
> Apropos of that, for 2.0 we're about to or have already ditched support
> for versions of Windows that do not have native UTF-8/UTF-16 conversions
> (ah, yes ... Windows has finally moved from UCS-2 to UTF-16). Wouldn't this
> be the right time to switch to using Windows' functions instead of staying
> with our own? Especially since, with the transition to UTF-16, we have to
> deal correctly with surrogate pairs, something our current code (IIRC)
> doesn't do.
>

A bit of a misnomer, the code is full of references to ucs-2 w/surrogate
pair
support, the combo of these is utf-16. The comments can be refreshed to
today's utf-16 nomenclature.

Today's logic remains correct, and of course does the correct thing,
because
an unpaired utf-8 surrogate value would be very broken and even possibly a
security issue, much as decoding other invalid utf-8 bytestreams proved to
be.

If you want to look at win32 api's, feel free to benchmark; though I doubt
it
would outperform the current implementation.

Reply via email to