Hi,

1) I do not know the best name to use.  Other names I have heard used are
"Windows 10 compression", "executable compression", "compact mode", and even
simply "file compression".  I suppose it could also be called "XPRESS/LZX"
compression based on the algorithms used, but that doesn't make a lot of
sense
to me because XPRESS and LZX are existing compression formats which have
been
used for years.  The "system compression" feature is an *application* of
those
formats rather than the formats themselves.  The feature was also designed
to be
extensible to new compression formats being added.

2) Were you using XPRESS4K, XPRESS8K, XPRESS16K, or LZX?  This has a big
effect
on the compression ratio you observe.  XPRESS4K is not a good choice, even
though I think Microsoft is using it more often than the others.

XPRESS and LZX will almost always be slower than LZNT1 because LZNT1 is
byte-based, with no entropy coding, whereas XPRESS and LZX are bit-based
with
entropy coding.  However, XPRESS and LZX can still be made very fast and are
well suited for modern processors.

It should be verified that data is not being decompressed more times than is
needed.  In system_compression.c there is a chunk cache to prevent exactly
this,
but currently it is going unused because I didn't immediately see a way to
re-use the same decompression context from one read() to the next.  This
should
be addressed if the code is going to be used for real.

There are a few decompression optimizations which I had removed to simplify
the
code for inclusion in libntfs-3g.  If needed I can add some of these back.
Also, certain functions such as read_huffsym() should be force-inlined.
This
omission was unintentional, since in my projects I have the compiler
force-inline all functions marked 'inline'.

"System compression" is promoted by Microsoft because many if not most
files on
real-world filesystems are only even written one time.

3) That might be the case.

4) I'll plan to address the minor warnings first, then address the stack
usage
separately by allocating a (reusable) decompression context for XPRESS or
LZX on
the heap.

5) My code is proof-of-concept only, and I have not added all the necessary
protections, e.g. to prevent users from writing to the compressed files or
opening the WofCompressedData stream directly.  It will need to be carefully
considered how these files should be exposed via the FUSE driver and via
libntfs-3g directly.

Answer to last question: compressors for XPRESS and LZX would be almost
entirely
new code, with very little shared with the decompressors. They should not be
added to libntfs-3g unless there is demand for them.


On Mon, Sep 21, 2015 at 6:48 AM, Jean-Pierre André <
jean-pierre.an...@wanadoo.fr> wrote:

> Hi Eric,
>
> I have finally made a few tests of this feature,
> sorry for the delay.
>
> I have a few comments :
>
> 1) is "system compressed" the Microsoft name for this
> feature ? A name based on the algorithms used would be
> more discriminating.
>
> 2) poor compression improvement
>
> msvcrt.dll uncompressed      633768 bytes
> --------- ntfs compressed    438272 (69.2%)
> --------- system compressed  403296 (63.6%)
> ----------gzipped            303880 (47.9%)
>
> Profiling reading msvcrt.dll on x86_64 showed system compressed to be
> four time slower than traditional ntfs compressed, half the time being
> spent in read_huffsym(). These numbers are to be taken with care, as
> the test is not long enough.
>
> stack 12608 (traditional 2960)
> heap 273942 (traditional 244233)
>
> Moreover such files have to be written sequentially, so I
> wonder why this mode is promoted by Microsoft on Windows 10.
>
> 3) Such files can have an EA, though this is forbidden by Microsoft,
> according to :
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa364404(v=vs.85).aspx
> (Currently ntfs-3g follows the rule, overriding it might
> be needed).
>
> 4) Several (minor) compiler warnings sent privately.
>
> 5) Rough tests on x86 32 and 64 bits
> Checked ok the md5 of a few DLLs (against another computer which,
> for some reason, did not get system-compressed DLLs).
> lseek() and stat() are also fine, but there appears to be no
> protection against writing, appending, resizing...
>
> 6) Rough tests on a Sparc CPU
> A few quick tests of read(), lseek() and stat() ran fine, no
> endianness or alignment issue met.
>
> Finally, a question : is the decompressing code reversible
> and reusable for compressing, or is some mirror code required
> for creating files ?
>
> Jean-Pierre
>
>
> Eric Biggers wrote:
>
>> Hi,
>>
>> There is not too much information specifically about this feature
>> available yet.
>> You can try googling "Windows 10" "System compression" to find some
>> articles.
>> If you are looking for information about the data format, it is not yet
>> documented in the context of the system compression feature but it seems
>> that
>> Microsoft lifted the format of the compressed data directly from the
>> Windows
>> Imaging (WIM) file format.
>>
>> One way to create such files for testing is to use the Windows 10 version
>> of the
>> "compact" program.  It has a new option for compressing files using one
>> of the
>> new formats:
>>
>>         /exe:xpress4k
>>         /exe:xpress8k
>>         /exe:xpress16k
>>         /exe:lzx
>>
>> The format is designed for write-once, read-many files, such as executable
>> files.  If you try to write to such a file on Windows, Windows immediately
>> decompresses it and turns it into a standard uncompressed file.  There is
>> no
>> need for manual cluster allocation as the feature is not implemented
>> directly in
>> NTFS.
>>
>> However, for reading, the compressed files can be accessed randomly with
>> "chunk"
>> granuality.  Each chunk can be decompressed independently.  If, say, you
>> want to
>> read starting from byte offset 1000000 and the chunks are 8192 bytes,
>> then you
>> know you need to read starting from chunk (1000000/8192) = 122.  Then you
>> can
>> load the offsets of chunks 122, and any later chunks that may be needed,
>> from
>> the "chunk table" at the beginning of the file.  Those will tell you
>> where in
>> the file the chunks are and what their compressed sizes are.
>>
>> Eric
>>
>> On Thu, Jul 16, 2015 at 09:59:46AM +0200, Jean-Pierre André wrote:
>>
>>> Hi Eric,
>>>
>>> Interesting.
>>>
>>> Where can I find more information about this feature,
>>> and how can I create such files on Windows 10 ?
>>>
>>> Glancing at your code, I do not see anything related
>>> to (sparse) cluster allocation. Does that mean these
>>> files are not seekable and must be read/written
>>> sequentially ?
>>>
>>> Regards
>>>
>>> Jean-Pierre
>>>
>>> Eric Biggers wrote:
>>>
>>>> Hello,
>>>>
>>>> I've made an experimental fork of ntfs-3g that supports reading the
>>>> "System
>>>> Compressed" files that are / will be supported by Windows 10.  This
>>>> feature
>>>> allows rarely-modified files to be stored using XPRESS or LZX
>>>> compression, with
>>>> stronger compression than the LZNT1 compression built into NTFS.
>>>> Windows 10
>>>> will supposedly enable it on selected files automatically.
>>>>
>>>> Microsoft designed this feature to use a reparse point which redirects
>>>> access to
>>>> a named data stream, which avoided changing NTFS itself.  The format of
>>>> the
>>>> compressed stream is identical to that of a compressed resource stored
>>>> in a
>>>> Windows Imaging (WIM) archive.
>>>>
>>>> I suspect it will be a while before NTFS-3g support would be useful to
>>>> more
>>>> people and it ultimately may not be worthwhile adding it at all
>>>> (especially
>>>> since this is a reparse-point based feature and therefore is not part
>>>> of NTFS
>>>> itself, and it takes quite a bit of code to support), but I thought I'd
>>>> post
>>>> this in case anyone else is interested.
>>>>
>>>> The source code is available as the "system_compression" branch of
>>>> https://github.com/ebiggers/ntfs-3g.git.
>>>>
>>>> Eric
>>>>
>>>
>>>
>>
>
>
------------------------------------------------------------------------------
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to