Hi again, Erik Larsson wrote: > Hi Jean-Pierre, > > On 2016-04-12 13:37, Jean-Pierre André wrote: >> Erik Larsson wrote: >>> In that case maybe we should unify the two #defines into something like >>> "ALLOW_BROKEN_UNICODE"? >> Probably yes. >> >> In https://en.wikipedia.org/wiki/Specials_(Unicode_block) >> the Unicode points U+FFFE and U+U+FFFF are qualified as >> "noncharacters", so I guess they should not be present in >> a file name. However we do not want to get stuck if this >> happens, this is what you have proposed ALLOW_BROKEN_UNICODE >> for, and it does not change the current behavior. > > I made this a separate patch... see patch 2 in attachments (patch 1 > should be the same as before). > > I was a bit confused because NOREVBOM was already set to 0. To reject > the BOM code points in NTFS UTF-16 strings it would be set to 1, right? > Thought the comment beside NOREVBOM said that you rejected the BOM code > points, which is the opposite of what the code was actually doing...?
Right, the comment did not match the setting, but the intent was to tolerate these codes (and the new patch does not change the behavior). Naming NOsomething a selector leads to wrong interpretations. > > Anyway, please review patch 2 carefully to make sure I didn't > misunderstand anything. Tested again, works as intended. > >>> Or we just keep them separate but make NOREVBOM 0 by default. >> IMHO there is no real need, let us keep it simple. > > Sounds good. If you agree with these two patches I will push them to git. Ok for me, please do. > > Best regards, > > - Erik > >>> On 2016-04-08 08:49, Jean-Pierre André wrote: >>>> Hi Erik, >>>> >>>> This is good to me. I will just suggest forcing NOREVBOM >>>> to zero when ALLOW_BROKEN_SURROGATES is set. >>>> >>>> The NOREVBOM only exists only because I could not find a >>>> reference for how to process a BOM in a file name. If we >>>> want to support bad codes, BOMs must not be rejected. >>>> >>>> Regards >>>> >>>> Jean-Pierre >>>> >>>> Erik Larsson wrote: >>>>> Hi Jean-Pierre, >>>>> >>>>> On 2016-04-07 16:52, Jean-Pierre André wrote: >>>>>> Erik Larsson wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 2016-04-06 19:22, Jean-Pierre André wrote: >>>>>>>> Erik Larsson wrote: >>>>>> >>>>>> [...] >>>>>> >>>>>>> I have a proposal that would enable accessing these broken files in >>>>>>> ntfs-3g and the progs. The proposal involves encoding broken >>>>>>> surrogate >>>>>>> UTF-16 units into their own separate 3-byte UTF-8 sequences. This is >>>>>>> sometimes referred to by the acronym WTF-8 (see: >>>>>>> https://en.wikipedia.org/wiki/UTF-8#WTF-8 ). >>>>>>> >>>>>>> The effect is that these files aren't ignored as in the previous >>>>>>> proposed patch but are included in the listing and can be looked >>>>>>> up as >>>>>>> any other file since encoding broken UTF-16 to WTF-8 and then >>>>>>> back to >>>>>>> broken UTF-16 is lossless, though the UTF-8 byte sequences >>>>>>> returned to >>>>>>> user aren't fully Unicode compliant. >>>>>>> However I think this is the best we can do without starting to >>>>>>> manufacture fake file names for these entries with all that >>>>>>> complexity. >>>>>>> >>>>>>> Please review the attached patch. >>>>>> >>>>>> From your proposal, you apparently only have to fix the >>>>>> processing of an isolated surrogate at the end of utf16 >>>>>> string. >>>>> >>>>> Thanks, I missed this case. I also noticed that you missed wrapping >>>>> this >>>>> in #if/#else/#endif. >>>>> See attachments for my updated v2 patch which does this as well. >>>>> >>>>>> With this fix, my test of all possibilities appears to >>>>>> run fine. >>>>> >>>>> Great. >>>>> >>>>> Best regards, >>>>> >>>>> - Erik >>>> >>>> >>> >>> >> >> > ------------------------------------------------------------------------------ Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z _______________________________________________ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel