Erik, Erik Larsson wrote:
Hi,On 2016-04-06 19:22, Jean-Pierre André wrote:Erik Larsson wrote:
[...]
I have a proposal that would enable accessing these broken files in ntfs-3g and the progs. The proposal involves encoding broken surrogate UTF-16 units into their own separate 3-byte UTF-8 sequences. This is sometimes referred to by the acronym WTF-8 (see: https://en.wikipedia.org/wiki/UTF-8#WTF-8 ). The effect is that these files aren't ignored as in the previous proposed patch but are included in the listing and can be looked up as any other file since encoding broken UTF-16 to WTF-8 and then back to broken UTF-16 is lossless, though the UTF-8 byte sequences returned to user aren't fully Unicode compliant. However I think this is the best we can do without starting to manufacture fake file names for these entries with all that complexity. Please review the attached patch.
From your proposal, you apparently only have to fix the processing of an isolated surrogate at the end of utf16 string. With this fix, my test of all possibilities appears to run fine. See attachment. Regards Jean-Pierre
Best regards, - Erik
unistr.patch
Description: application/download
------------------------------------------------------------------------------
_______________________________________________ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel