Hi everybody,

an old idea for FreeDOS is having a driver which
supports compressed filesystems... One idea for
implementation is this: Modify FAT16 by using 1
FAT and 1 table of compressed cluster sizes and
keep boot, FAT and root uncompressed. Only data
clusters would be compressed, e.g. using LZO, a
fast and small GPLed library. Decompression needs
no extra RAM, compression needs 8k or 64k of RAM.
The small version compiles to ca 5k x86 code...
LZO homepage: www.oberhumer.com/opensource/lzo/

You probably know SHSURDRV and MEMDISK, a normal
and a bootable RAMDISK which can load a compressed
disk image at start but need as much RAM as the
uncompressed image needs. Written data is lost at
reboot. You probably also know SHSUFDRV, a disk
which is backed by a diskimage file. This uses a
somewhat complicated method to avoid "using the
DOS kernel while it is already in use", so I am
interested to hear about your experience with it.

The suggested compressed filesystem can be:

1. something like SHSUFDRV but with a compressed diskimage

2. something like SHSURDRV but keeping data compressed in RAM

In addition, it can be:

A. readonly, which would save RAM needed for compression

B. writeable using a special area in the image (you would
  have some tool to compress modified data later offline)

C. writeable using a special tool which "downloads" changes
  and adds them to the image, either as in "raw B" or by
  directly creating a fully compressed updated image again.

Note that B. only works with 1. and C. works best with 2.

The idea behind C. "download changes" or B. "log changes"
is that whenever you write data, the cluster is either
flagged as dirty (if data still fits into the same area
of compressed data, possibly using padding) or written
to a flat "modified clusters" file. That file would have
to be created at image creation, it must be readonly as
a file and uncompressed. In the case of C, a tool would
access the "compression info" and update dirty clusters
in the image and update the modified clusters file there.

Suggested format of the "compression info" which "hides"
where the 2nd FAT copy would normally be (if you use DOS
block device access, you would be redirected to the 1st
FAT, making the device looks like an uncompressed one)...

- for each cluster, there is a word of info

- if the top bit is set, the cluster is "dirty"
  (test sign to get flag for downloading changes)

- if the 2nd from top bit is set, the cluster is
  in the "modified cluster file" (shl, then test
  sign to get flag) and you have a gap in the
  main image; Size can be stored in an array in
  the modified cluster file at a fixed location

- the low 14 bits are either the compressed size
  in words (for info values 0-3fff and 8000-bfff)
  or the offset in the "modified cluster file" in
  clusters (for info values 4000-7fff, c000-ffff)

Limitations of this design:

- the filesystem can be max 2 GBytes, 32 kB/cluster

- the modified cluster file cannot be resized
  and it has no usage map, so each time a new
  cluster fails to compress, it fills up more

- you either need ca 5+8 kB extra RAM to compress
  or all modified clusters go into that "mod file"

- you either need the SHSUFDRV image access tricks
  or as much RAM as the compressed image is big...

- the "gap size is in mod file" thing is ugly! Ideas?

- if you take the RAM approach, all changes since
  you last ran that "download dirty clusters to
  the compressed image" tool.

- reducing the fill level of the "mod" file would
  only be possible by checking each cluster there
  for whether it can be "moved back into the gap"
  (depending on how compressible content is now)
  or by adjusting gap sizes but...

- ...you cannot grow gaps/slots without moving all
  the data after them which is very slow on disk.
  Of course you could read everything into 2 GB RAM
  and move around everything and write a new image.
  Or you could say slots never shrink beyond size X?

Soooo... I said in the subject that there would be
a poll, here you go... :-)  We can discuss your
opinion about as many of those points as you want.

- would you want a compressed filesystem to be writeable?

- should writes go to the host disk image immediately?

- can everything be in XMS or should it be disk-backed?

- should writes be compressed immediately?

- is FAT16 a good choice? and are cluster sized slots?
  each cluster (0.5-32 kB) is (de)compressed separately

- is fully dynamic slot size a good choice? lower limits
  would increase chances to fit updated data in old slots

- if there is interest for a FAT32 version, maybe even in
  RAM, would a HIMEMXXL be possible which can give >4 GB?
  Minimal size for FAT32 filesystems is ca 33 MB, though.

- how compressible is the data you would put in compressed
  filesystems? and is the suggested driver useful at all
  compared to MEMDISK or SHSURDRV loading img.gz at init?

Thanks for commenting :-)


PS: FAT32 needs > 64k clusters, which means FATs waste
some space, but otoh, one could store BOTH gap size and
mod file slot number together in those 32 bit / cluster.

This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
Freedos-user mailing list

Reply via email to