Hi Erik,

This is good to me. I will just suggest forcing NOREVBOM
to zero when ALLOW_BROKEN_SURROGATES is set.

The NOREVBOM only exists only because I could not find a
reference for how to process a BOM in a file name. If we
want to support bad codes, BOMs must not be rejected.

Regards

Jean-Pierre

Erik Larsson wrote:
> Hi Jean-Pierre,
>
> On 2016-04-07 16:52, Jean-Pierre André wrote:
>> Erik Larsson wrote:
>>> Hi,
>>>
>>> On 2016-04-06 19:22, Jean-Pierre André wrote:
>>>> Erik Larsson wrote:
>>
>> [...]
>>
>>> I have a proposal that would enable accessing these broken files in
>>> ntfs-3g and the progs. The proposal involves encoding broken surrogate
>>> UTF-16 units into their own separate 3-byte UTF-8 sequences. This is
>>> sometimes referred to by the acronym WTF-8 (see:
>>> https://en.wikipedia.org/wiki/UTF-8#WTF-8 ).
>>>
>>> The effect is that these files aren't ignored as in the previous
>>> proposed patch but are included in the listing and can be looked up as
>>> any other file since encoding broken UTF-16 to WTF-8 and then back to
>>> broken UTF-16 is lossless, though the UTF-8 byte sequences returned to
>>> user aren't fully Unicode compliant.
>>> However I think this is the best we can do without starting to
>>> manufacture fake file names for these entries with all that complexity.
>>>
>>> Please review the attached patch.
>>
>> From your proposal, you apparently only have to fix the
>> processing of an isolated surrogate at the end of utf16
>> string.
>
> Thanks, I missed this case. I also noticed that you missed wrapping this
> in #if/#else/#endif.
> See attachments for my updated v2 patch which does this as well.
>
>> With this fix, my test of all possibilities appears to
>> run fine.
>
> Great.
>
> Best regards,
>
> - Erik



------------------------------------------------------------------------------
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to