Richard W.M. Jones wrote: > A user of my program 'virt-v2v' is unable to convert their Windows > Server 2003 guest. Apparently the Windows guest contains a malformed > filename in C:\Windows\System32 which causes ntfs-3g to error when > reading this directory. Anything that accesses this directory > (eg. 'ls' or my program that uses readdir) fails with: > > ls: reading directory /sysroot/WINDOWS/system32: Invalid or incomplete > multibyte or wide character > > (This is errno EILSEQ). > > Under Windows itself, the directory appears normal -- it is able to be > listed and so on. There are two files with non-ASCII characters, but > deleting both of them (using Windows) did not change the problem with > ntfs-3g. > > I am not able to get a copy of the broken disk image, because it's > 130GB in size. > > But I did manage to create a broken filesystem that behaves in a > similar manner. I did that by hexediting a NTFS disk image to add an > illegal UCS-2 character (U+DF00) to a filename. You can get that disk > image by downloading the attachment here:
By default ntfs-3g treats ntfs file names as utf16-le, which means Unicode points beyond U+ffff are expected to be encoded as surrogate pairs (U+d800..U+dfff). Windows makes apparently no special case for these codes... But they are not valid Unicode points and cannot be translated to valid utf8. > > https://bugzilla.redhat.com/show_bug.cgi?id=1301593#c6 > > As far as I know, the real broken disk image was NOT created by > hexediting or otherwise hacking the filesystem, but by some ordinary > process on Windows (not yet understood nor reproduced). > > My question then is can we somehow ignore these files? No way currently... (safe of hex-editing, or patching the translations in libntfs-3g/unistr.c) > > Also, how does the locale setting affect ntfs-3g? Does it use the > locale? Would a different LC_ALL setting affect how ntfs-3g might > process a broken UCS-2 character? (I tried several LC_ALL settings, > but with no apparent effect). Locale setting is discouraged because this frequently leads to errors. The standard Linux translations by wctomb() checks whether the character is valid with respect to the selected locale. The "locale" mount option is available (not yet removed from the man page...). > > Because of the complexity of virt-v2v and the number of places where > we want to read C:\Windows\System32 (including from external > programs), working around this in our software is going to be > difficult. How comes that such names were created in the Windows system directory ? Do the names serve as binary hashcode values ? If so how would you intend to use them as utf-8 ? Regards Jean-Pierre ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel