Hi,

I looked into this image and noticed that there are 4 filenames in 
/WINDOWS/system32 that cannot be decoded.

One example is the MFT entry 30661 with the filename (as UTF-16 units): 
0xDE5C 0xDC93 0x002E 0x006C 0x006F 0x0067
The filename ends with '.log' but the first two UTF-16 units is where 
Unicode decoding blows up. 0xDE5C is the low value of a surrogate pair 
according to Wikipedia (range: 0xDC00-0xDFFF). We are expecting the high 
value (0xD800-0xDBFF) to come first.
It is then followed by another low value of a surrogate pair, 0xDC93. 
This is clearly a corruption... a surrogate pair should consist of a 
high value followed by a low value.

I have no idea how this file was created... if Windows did this, then we 
might need to be able to cope with such corruption better (e.g. ignoring 
the entry during readdir and just emit a log message).

Best regards,

- Erik

On 2016-04-06 13:06, Richard W.M. Jones wrote:
> The reporter kindly gave me permission to distribute the metadata
> file.  I've put it up here:
>
>    http://oirase.annexia.org/tmp/bz1301593/
>
>    $ md5sum ntfsclone_sda2.xz
>    6cadc64de3196311c8159dc12f84484c  ntfsclone_sda2.xz
>
> Rich.
>


------------------------------------------------------------------------------
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to