Hi Eric, I have finally made a few tests of this feature, sorry for the delay.
I have a few comments : 1) is "system compressed" the Microsoft name for this feature ? A name based on the algorithms used would be more discriminating. 2) poor compression improvement msvcrt.dll uncompressed 633768 bytes --------- ntfs compressed 438272 (69.2%) --------- system compressed 403296 (63.6%) ----------gzipped 303880 (47.9%) Profiling reading msvcrt.dll on x86_64 showed system compressed to be four time slower than traditional ntfs compressed, half the time being spent in read_huffsym(). These numbers are to be taken with care, as the test is not long enough. stack 12608 (traditional 2960) heap 273942 (traditional 244233) Moreover such files have to be written sequentially, so I wonder why this mode is promoted by Microsoft on Windows 10. 3) Such files can have an EA, though this is forbidden by Microsoft, according to : https://msdn.microsoft.com/en-us/library/windows/desktop/aa364404(v=vs.85).aspx (Currently ntfs-3g follows the rule, overriding it might be needed). 4) Several (minor) compiler warnings sent privately. 5) Rough tests on x86 32 and 64 bits Checked ok the md5 of a few DLLs (against another computer which, for some reason, did not get system-compressed DLLs). lseek() and stat() are also fine, but there appears to be no protection against writing, appending, resizing... 6) Rough tests on a Sparc CPU A few quick tests of read(), lseek() and stat() ran fine, no endianness or alignment issue met. Finally, a question : is the decompressing code reversible and reusable for compressing, or is some mirror code required for creating files ? Jean-Pierre Eric Biggers wrote: > Hi, > > There is not too much information specifically about this feature available > yet. > You can try googling "Windows 10" "System compression" to find some articles. > If you are looking for information about the data format, it is not yet > documented in the context of the system compression feature but it seems that > Microsoft lifted the format of the compressed data directly from the Windows > Imaging (WIM) file format. > > One way to create such files for testing is to use the Windows 10 version of > the > "compact" program. It has a new option for compressing files using one of the > new formats: > > /exe:xpress4k > /exe:xpress8k > /exe:xpress16k > /exe:lzx > > The format is designed for write-once, read-many files, such as executable > files. If you try to write to such a file on Windows, Windows immediately > decompresses it and turns it into a standard uncompressed file. There is no > need for manual cluster allocation as the feature is not implemented directly > in > NTFS. > > However, for reading, the compressed files can be accessed randomly with > "chunk" > granuality. Each chunk can be decompressed independently. If, say, you want > to > read starting from byte offset 1000000 and the chunks are 8192 bytes, then you > know you need to read starting from chunk (1000000/8192) = 122. Then you can > load the offsets of chunks 122, and any later chunks that may be needed, from > the "chunk table" at the beginning of the file. Those will tell you where in > the file the chunks are and what their compressed sizes are. > > Eric > > On Thu, Jul 16, 2015 at 09:59:46AM +0200, Jean-Pierre André wrote: >> Hi Eric, >> >> Interesting. >> >> Where can I find more information about this feature, >> and how can I create such files on Windows 10 ? >> >> Glancing at your code, I do not see anything related >> to (sparse) cluster allocation. Does that mean these >> files are not seekable and must be read/written >> sequentially ? >> >> Regards >> >> Jean-Pierre >> >> Eric Biggers wrote: >>> Hello, >>> >>> I've made an experimental fork of ntfs-3g that supports reading the "System >>> Compressed" files that are / will be supported by Windows 10. This feature >>> allows rarely-modified files to be stored using XPRESS or LZX compression, >>> with >>> stronger compression than the LZNT1 compression built into NTFS. Windows 10 >>> will supposedly enable it on selected files automatically. >>> >>> Microsoft designed this feature to use a reparse point which redirects >>> access to >>> a named data stream, which avoided changing NTFS itself. The format of the >>> compressed stream is identical to that of a compressed resource stored in a >>> Windows Imaging (WIM) archive. >>> >>> I suspect it will be a while before NTFS-3g support would be useful to more >>> people and it ultimately may not be worthwhile adding it at all (especially >>> since this is a reparse-point based feature and therefore is not part of >>> NTFS >>> itself, and it takes quite a bit of code to support), but I thought I'd post >>> this in case anyone else is interested. >>> >>> The source code is available as the "system_compression" branch of >>> https://github.com/ebiggers/ntfs-3g.git. >>> >>> Eric >> > ------------------------------------------------------------------------------ _______________________________________________ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel