Hi Erik, This is good to me. I will just suggest forcing NOREVBOM to zero when ALLOW_BROKEN_SURROGATES is set.
The NOREVBOM only exists only because I could not find a reference for how to process a BOM in a file name. If we want to support bad codes, BOMs must not be rejected. Regards Jean-Pierre Erik Larsson wrote: > Hi Jean-Pierre, > > On 2016-04-07 16:52, Jean-Pierre André wrote: >> Erik Larsson wrote: >>> Hi, >>> >>> On 2016-04-06 19:22, Jean-Pierre André wrote: >>>> Erik Larsson wrote: >> >> [...] >> >>> I have a proposal that would enable accessing these broken files in >>> ntfs-3g and the progs. The proposal involves encoding broken surrogate >>> UTF-16 units into their own separate 3-byte UTF-8 sequences. This is >>> sometimes referred to by the acronym WTF-8 (see: >>> https://en.wikipedia.org/wiki/UTF-8#WTF-8 ). >>> >>> The effect is that these files aren't ignored as in the previous >>> proposed patch but are included in the listing and can be looked up as >>> any other file since encoding broken UTF-16 to WTF-8 and then back to >>> broken UTF-16 is lossless, though the UTF-8 byte sequences returned to >>> user aren't fully Unicode compliant. >>> However I think this is the best we can do without starting to >>> manufacture fake file names for these entries with all that complexity. >>> >>> Please review the attached patch. >> >> From your proposal, you apparently only have to fix the >> processing of an isolated surrogate at the end of utf16 >> string. > > Thanks, I missed this case. I also noticed that you missed wrapping this > in #if/#else/#endif. > See attachments for my updated v2 patch which does this as well. > >> With this fix, my test of all possibilities appears to >> run fine. > > Great. > > Best regards, > > - Erik ------------------------------------------------------------------------------ _______________________________________________ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel