Re: [ntfs-3g-devel] Experimental support for Windows 10 "System Compressed" files

Jean-Pierre André Sat, 21 Nov 2015 00:57:07 -0800

Hi Eric,

Eric Biggers wrote:
> Hi Jean-Pierre,
>
> I've made a few updates to the "system compression" branch.
>
> I finally got around to testing files with uncompressed size >= 4 GiB.  It 
> turns
> out that Windows *does* permit system compression on such files.  The file
> format changes slightly to accomodate 64-bit offsets rather than 32-bit 
> offsets,
> (exactly the same as in WIM archives), so I updated the code accordingly.

Great !

I have a question on the use of system compression on
Windows 10 : I have two computers running Windows 10.
Both have the same disk size and same partition layout
(64GB for Win10, about 32GB used), but one uses system
compression and the other one does not. There has been
several major updates with this difference of behavior
unchanged.

Do you know why ?

There has been suggestions that this might be due to
some hardware feature (in my case the CPU which uses
compression is an Intel quadcore 64-bits 2.4GHz, and the
one which does not is an AMD dualcore 64-bits 2.2GHz).

When using gzipped images for backup of system partitions,
compression is not a useful option (the more as the saved
space on the system partition is left unused).

> I added a check in ntfs_fuse_open() to forbid writing to the unnamed data 
> stream
> of system compressed files, since it is not supported.  Such files are
> effectively read-only; the write bit is being cleared in the mode as well.  I
> suppose it would be possible to implement Windows' behavior where it
> automatically decompresses the file if you try to write to it, but I'm passing
> on that for now.
>
> I simplified chunk caching in the decompression context.  Now it just holds 
> the
> most recently decompressed chunk, which should be good enough for library 
> users
> who are unaware of the precise compression chunk size.  However, the FUSE 
> driver
> still just opens the inode and allocates a new decompression context for every
> read.  Since the FUSE driver --- the high-level one, at least --- doesn't
> currently maintain file descriptor structures, there wasn't much that could be
> done.  But it does do big reads, as you mentioned.

The low level fuse interface does the same. By the
way, your (original) patch to the high level one can
be used in the low level one with just minor changes.

> (Side note: in the FUSE filesystem I have in wimlib for mounting WIM images, I
> set the 'fh' member of the 'struct fuse_file_info' to a file descriptor
> structure in the ->open() operation, and I have 'flag_nullpath_ok' set in the
> 'struct fuse_operations'.  Then, I just get the file descriptor structure, 
> with
> no path, passed to operations such as ->read().  If something like that could 
> be
> done with NTFS-3g and objects like inodes could be left open for many reads or
> writes, I expect it would make things a bit faster for all users.  Maybe it's
> not possible because you could end up with the same inode opened multiple 
> times
> at once, in different file descriptors...)

Actually I have tried to keep inodes open a few years
ago, though I am not sure I had investigated having
multiple descriptors to the same inode.

I finally gave up because the result was worse for
very fragmented files (which is frequent for videos
or files compressed the old way). This is because new
parts of the run list are decompressed into memory
and not discarded until the file is released (which
means all descriptors are closed).

I then added a count of entries in the runlist to be
able to quickly find a needed entry. This improved
significantly access to fragmented files, but worsened
access to less fragmented ones.

So I finally ended up with cacheing the inodes so that
reopening an inode which has been opened recently is
fast. The stat() data are cached, but the runlists are
not.

About integrating system compression into ntfs-3g, I
want to try making it a plugin loaded on demand. this
will improve modularity and facilitate maintainance.
For instance an upgrade such as the one you are
announcing could be released independently of ntfs-3g.
In the fuse interface (ntfs-3g.c or lownetfs-3g.c) the
inode would be opened, and if it has a reparse mark,
control would be passed to a handler depending of the
reparse tag, either internal (symlinks and junctions)
or external (system compression). The interface would
be like the fuse one, with the path replaced by a pointer
to inode descriptor. More thoughts needed...

> Finally, I made a few other code cleanups and added a short subsection to the
> ntfs-3g man page.
>
> Eric
>
>
> On Tue, Sep 22, 2015 at 10:54:10PM -0500, Eric Biggers wrote:
>> I've pushed changes to my repository that address a few things you brought
>> up:
>>
>> - compiler warnings addressed
>> - decompression memory allocated on heap rather than stack
>> - a couple optimizations for decompression speed
>>
>> I'll take a closer look at the interaction with the NTFS-3g driver when I
>> have time.
>>
>>
>>
>> On Tue, Sep 22, 2015 at 10:49 PM, Eric Biggers <ebigge...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> "WOF compression" is as good as the other names.  It still seems slightly
>>> wrong
>>> because WOF (the "Windows Overlay Filesystem Filter") is a more general
>>> feature,
>>> and this is actually the *second* compression technology that Microsoft has
>>> built on top of it (the first was "WIMBoot").  For now, I'll keep the code
>>> the
>>> way it is, using the "system compression" name.  It could be that
>>> Microsoft will
>>> release more documentation for this.
>>>
>>> Yes, your reparse data indicates XPRESS4K compression (the fourth 32-bit
>>> little
>>> endian word is 0).  FYI, here are the compressed sizes I get with the
>>> Silesia
>>> corpus (uncompressed size: 211,938,580 bytes total):
>>>
>>> LZNT1 (NTFS compression): 121,049,088 bytes
>>> XPRESS4K: 104,124,416 bytes
>>> XPRESS8K: 95,465,472 bytes
>>> XPRESS16K: 90,460,160 bytes
>>> LZX: 69,144,576 bytes
>>>
>>> Even though FUSE makes big reads, it would be nice to not have to allocate
>>> a
>>> decompression context for every read.  That would avoid doing all of the
>>> following on a per-read basis:
>>> - open WofCompressedData attribute
>>> - allocate heap memory for ntfs_system_decompression_ctx
>>> - allocate heap memory for XPRESS or LZX
>>> - read chunk offsets from the compressed file's chunk table
>>>
>>> Having an external tool to create "system compressed" files, if people
>>> want that
>>> support, is probably the way to go.  Probably that would be possible even
>>> with
>>> no changes in libntfs-3g.
>>>
>>> Eric
>>>
>>>
>

------------------------------------------------------------------------------
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Re: [ntfs-3g-devel] Experimental support for Windows 10 "System Compressed" files

Reply via email to