Richard W.M. Jones wrote:
> A user of my program 'virt-v2v' is unable to convert their Windows
> Server 2003 guest.  Apparently the Windows guest contains a malformed
> filename in C:\Windows\System32 which causes ntfs-3g to error when
> reading this directory.  Anything that accesses this directory
> (eg. 'ls' or my program that uses readdir) fails with:
>
>    ls: reading directory /sysroot/WINDOWS/system32: Invalid or incomplete 
> multibyte or wide character
>
> (This is errno EILSEQ).
>
> Under Windows itself, the directory appears normal -- it is able to be
> listed and so on.  There are two files with non-ASCII characters, but
> deleting both of them (using Windows) did not change the problem with
> ntfs-3g.
>
> I am not able to get a copy of the broken disk image, because it's
> 130GB in size.
>
> But I did manage to create a broken filesystem that behaves in a
> similar manner.  I did that by hexediting a NTFS disk image to add an
> illegal UCS-2 character (U+DF00) to a filename.  You can get that disk
> image by downloading the attachment here:

By default ntfs-3g treats ntfs file names as utf16-le,
which means Unicode points beyond U+ffff are expected
to be encoded as surrogate pairs (U+d800..U+dfff).

Windows makes apparently no special case for these
codes... But they are not valid Unicode points and
cannot be translated to valid utf8.

>
>    https://bugzilla.redhat.com/show_bug.cgi?id=1301593#c6
>
> As far as I know, the real broken disk image was NOT created by
> hexediting or otherwise hacking the filesystem, but by some ordinary
> process on Windows (not yet understood nor reproduced).
>
> My question then is can we somehow ignore these files?

No way currently... (safe of hex-editing, or patching
the translations in libntfs-3g/unistr.c)

>
> Also, how does the locale setting affect ntfs-3g?  Does it use the
> locale?  Would a different LC_ALL setting affect how ntfs-3g might
> process a broken UCS-2 character?  (I tried several LC_ALL settings,
> but with no apparent effect).

Locale setting is discouraged because this frequently
leads to errors. The standard Linux translations by
wctomb() checks whether the character is valid with
respect to the selected locale.

The "locale" mount option is available (not yet removed
from the man page...).

>
> Because of the complexity of virt-v2v and the number of places where
> we want to read C:\Windows\System32 (including from external
> programs), working around this in our software is going to be
> difficult.

How comes that such names were created in the Windows
system directory ? Do the names serve as binary
hashcode values ? If so how would you intend to use
them as utf-8 ?

Regards

Jean-Pierre

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to