Hi Antony,

for now, I would not recommend to try to clone any MS
compressed filesystem. Remember that they themselves
have had licensing issues with their own compressed
filesystem. One thing which makes doublespace and co
complex is their ability to WRITE to the filesystem
while it is mounted / in use. Let me explain my own
suggestion for a simpler system:

- replace the 2nd FAT by an array of cluster sizes (FAT16)
  or cluster offsets (FAT32). The latter has better speed
  but for the former, you can buffer some intermediate
  offsets in RAM (eg offset of every 256th cluster)

- clusters which have same size as uncompressed ones, are
  uncompressed clusters. Simple. Compressed clusters should
  be stored at offsets which are multiples of the sector
  size, to keep access times low and to simplify access...

- clusters which have size 0 are not in use. Simple.

- either you allow no writing, or if you write, you only
  allow writes for which the compressed cluster must not
  grow. If a write violates both constraints, you have to
  change the FAT to use a fresh cluster at the current end
  of the compressed filesystem. Such clusters ARE allowed
  to grow, but only if there are no used clusters at any
  later offset in the compressed filesystem.

- if you allow writing, do not actually shrink compressed
  clusters. Of course this can cause fragmentation. You
  can have a tool which collects all unused clusters of
  nonzero size and recycles the space they take, but this
  would happen while the filesystem is not active / mounted.

- you could use heuristics on initial filesystem creation /
  compression, for example to avoid compression of directories
  (this makes it easier for the directories to grow later).



I hope that such an implementation would be quite straigth-
forward and have good performance for read-only access. It
can be a FAT block device for DOS as "output" and can take
a file, partition or other data source as "input". It would
do the following transformations:

- first FAT and boot sector stay as they are

- access to 2nd FAT is redirected to first FAT

- access to clusters is redirected by reading the table
  explained above (which is stored at the location of the
  former 2nd FAT) to find out the actual starting sector /
  offset of the cluster in the compressed filesystem, and
  then reading and decompressing that cluster and returning
  the decompressed contents. Writing will be more complex
  but is not needed for a first implementation.

- note that clusters are ALWAYS in strict linear order,
  even with the suggested method which would allow writes
  to the filesystem as described above. That avoids any
  need for extra redirections like "cluster heaps" :-).



You also need a tool for initial compression / creation of
such a compressed filesystem. Luckily it can operate in the
same space as the uncompressed filesystem, replacing it...

- walk through all clusters, and copy them from their normal
  version into their compressed version. As clusters never
  are bigger than in uncompressed state, this is as easy as
  successively overwriting all clusters with compressed data
  and filling the former 2nd FAT data area with the size info
  or offset info of all compressed clusters :-)

- this works for both FAT16 and FAT32, but I would not like
  the hassles to make it work for FAT12. Yet it would be
  possible. In FAT12, every entry has 1.5 bytes size, and
  every "compressed size" entry will have to be 1 or 1.5
  bytes as well. Actually 1 byte is enough as all sizes are
  multiples of 1 sector, as suggested above, and one cluster
  is never more than 64 or 128 sectors (of 512 bytes) big.

- note that you can, in principle, save some space by always
  storing only compressed SIZES. Yet as we have as much space
  as the 2nd FAT took, it is more efficient to store 32bit
  OFFSETS (in sector size units: up to 2 terabytes) for FAT32.
  Then you know at once where a compressed cluster startes,
  without having to sum up sizes :-).

- the input filesystem can be an actual partition on disk
  or the FAT filesystem of a RAMDISK, does not really matter

- the output can be stored on the source partition or on
  the ramdisk, but you will typically download it into a
  file when the compression is complete, as such a compressed
  filesystem image is smaller than the original filesystem.

- I suggest to use the source disk as the output target for
  the compression tool and use general tools like DISKCOPY
  to copy the compressed filesystem to a file when done...

- of course you must not write to a filesystem WHILE it is
  being processed by the compression tool :-p.

Eric :-)

PS: 16bit OFFSETS would not be so useful, as they only allow
partition sizes of up to 32 MB if the unit is sectors ;-).


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Freedos-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Reply via email to