Erik,

Erik Larsson wrote:
Hi,

On 2016-04-06 19:22, Jean-Pierre André wrote:
Erik Larsson wrote:

[...]

I have a proposal that would enable accessing these broken files in
ntfs-3g and the progs. The proposal involves encoding broken surrogate
UTF-16 units into their own separate 3-byte UTF-8 sequences. This is
sometimes referred to by the acronym WTF-8 (see:
https://en.wikipedia.org/wiki/UTF-8#WTF-8 ).

The effect is that these files aren't ignored as in the previous
proposed patch but are included in the listing and can be looked up as
any other file since encoding broken UTF-16 to WTF-8 and then back to
broken UTF-16 is lossless, though the UTF-8 byte sequences returned to
user aren't fully Unicode compliant.
However I think this is the best we can do without starting to
manufacture fake file names for these entries with all that complexity.

Please review the attached patch.

From your proposal, you apparently only have to fix the
processing of an isolated surrogate at the end of utf16
string.

With this fix, my test of all possibilities appears to
run fine.

See attachment.

Regards

Jean-Pierre


Best regards,

- Erik

Attachment: unistr.patch
Description: application/download

------------------------------------------------------------------------------
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to