Re: sort --random-sort -- now shred.c

ThMO Fri, 02 Dec 2005 12:49:24 -0800

Hello Paul and others listening,

sorry for the somewhat longer delay...


> > /dev/urandom does *not* provide high quality pseudo-random numbers,
> 
> Yes, and thanks for your comments, but the existing substitute doesn't
> provide them either, so from the point of view of randomness quality
> it wouldn't be a loss to use /dev/urandom if available.

That's true too, but it should be sufficient to use rand(), or lrand48(),
in case the C library is older (or even random(), if it's much older), in
order to shuffle the input lines in random order, while seeded with the
current time, so you don't need to provide a pseudo-random number genera=
tor either.

> An advantage of using /dev/urandom is that, if the lack of quality is
> an issue, we can blame the kernel rather than blaming our own code.
> I'm half-joking here, but I'm half-serious as well.  If coreutils can
> avoid the hassle of providing support for reasonably-high-quality
> random numbers, then I'd rather go that route.

:-)=)
But as I've stated above, if it's just needed for shuffling some input
lines, IMHO it should be more than sufficient, to use, what the C library
already provides, in order to get some randomness, so you could safely
eliminate the number generator, thereby reducing maintenance.

> > Even if considering `shred' (or the various wipe tools available), there is
> > mostly no need for good randomness, or even randomness at all,
> 
> OK, in that case then let's modify 'shred' so that it doesn't rely on
> random numbers at all.  (It'd make for one less red herring to kill....)

After looking at `shred.c', I do understand very well, why you would want
the number generator eliminated.

But I would like to clarify some things after reading `shred.c', since what
has been written inside the info-manual according to `shred' does give some
false sense of being sure, the things could get wiped out, although it doesn't,
since the source file does reference Gutmann's article.

In order not to go into too much detail, the algorithm uses knowledge about
the way, *older* hard-drives stored information on disk, namely run-length
encoding with a simple error-correction scheme - as a result, Gutmann's algo=
rithm overwrites the file in question by reproducing all possible bit combina=
tion, as far as the RLE-ECC is concerned, shifted through positions, in order
to saturate the magnetic particles, until it is equivalent like "hammering"
all-ones onto the disk.  This worked well for really old drives - we're talk=
ing about 20 year old drives here - but not for new ones.

Still, the 10 stages for MFM recording are true for wiping floppy disks, even
today, although one should consider it faster and more secure to simply break
the floppy disk into peaces, instead of doing 10 passes waiting for minutes.

Modern hard-drives (e.g. every disk not older than say 10 years) are EPRML
encoded with up to 200 ECC bits, so not only to be able to correct 30+ bytes
of burst errors, as additionally the nowadays high capacities couldn't get
achieved.  For those drives it's said, that up to 2, *maybe* 3 overwriting
passes with some different data *may* get recovered under best circumstances;
which means, it's nearly absolutely safe to do some say 4-5 passes with some
different bit patterns in order to wipe the information away - this could be
done by using some randomly selected 4-5 patterns from the 27 pattern pool.

But the main problem is not the selected pattern itself - the problem really
is getting the data written to the disk.
One thing, which isn't clearly stated inside the manual, is the fact, that
even when syncing the data to disks while using a journaling file-system, the
driver states a successful write operation, if the data has been physically
written to the journal.  If you start a second write operation, while the
file-system is busy, which is naturally given for a multi-user system, the
file-system driver may not find the time to synchronize the log-entry with
the physical position of the file in question.  If now arrives the second
write operation with the next pattern, only the old journal entry gets up=
dated and not the file itself.  This means, that it's mostly *not* the case,
that even a synchronized write operation on a journaling file-system really
succeeds, which further means, that the journaling function for this oper=
ation *needs* to be turned-off.
But even if the file-system is not a journaling one, it's very difficult to
get the various passes written to disk in order.  I'm now only talking about
SCSI disks, since I do know nothing about EIDE-drives, but most of the things
will be the same for them.

Typical SCSI disks, desktop- or server-variants, do have 8 MB of cache RAM
located inside the disks - server-only SCSI disks are available with 16 MB
of cache RAM.  If we think about having some secure data, which must be
wiped-out, we talk about at most 4 kB, so it's never the case, that the
wiping process does overflow the drive-intern caches.  Furthermore do SCSI
disks operate with command queuing, which means, the drives will always re=
arrange the SCSI commands at will (actually it depends, where the heads are
and what read/write operations are outstanding).
So in order to wipe some file located on a SCSI disk, one needs to perform
the following tasks:
· deactivate the write cache, which slows down write operations heavily
  (surely it needs to be taken into account, that the drive's firmware is
   buggy and doesn't honor this request, or only honor it after a spin-down)
· activate the forced-media-access bit, which degrades performance *heavily*
  it must be noted here, that only deactivating write caching does not ensure,
  things are get written when requested, so this is the most important option
  needed...
· deactivate command linking, if available and modifyable
· deactivate command reordering (or tagged-command-queuing)
All those modifications regarding to the drives mode-pages can only be done
by the super-user, but only if the underlying SCSI-driver does support those
operations; besides one must be *very* carefully, since changing a different
bit might break the drives operation, since many drives do use some reserved
bits for manufacturer-dependant actions, e.g. my 3 IBM drives are supporting
a mode called `auto-maintenance-mode' through (undocumented) reserved bits,
which means that they do a spin-down/spin-up cycle every 6-7 days, thereby
moving their heads into the parking zone and cleaning them, so they get not
destroyed, while running for months continously, even though those disks are
desktop-variants.
Another problem will be dynamically attached sectors, sometimes called notched
pages; the drive might relink on the fly sectors on long writes internally,
without overwriting the old sector; instead this sector will be marked free
only, so nothing gets overwritten.  (notice, that I'm *not* talking about re=
mapping defective sectors here...)
Now, one could think, that a low-level format might erase the former contents.
This is wrong - the drive only arranges the sectors for the controllers spe=
cific interleaving and nothing more (I know this, since I've repartioned an
older hard-disk of mine after low-level formatting and accessed it without
creating a file-system again - and was very surprised to find all my stuff
again...).  Only defective sectors will be remapped through the low-level
formatting process, as otherwise it would take very much longer - but to be
precise here, some newer SCSI-3 disks might provide a way to really over=
write the previous contents...

So the question is - what's `shred' good for?
It might give someone a good feeling of being sure, that some sensitive data
has been erased - or not - but that's all.  ;-)

Just as an example, considerung an actual SCSI drive with 76 GB capacity
and some 3-4 kB of sensitive data overwritten with some junk - it'll surely
be a *very* time-consuming task for someone locating exactly those 4 kB out
of the 76 GB and trying to fetch the previous data written there...

As far as the renaming operation is concerned, personally I think that this
is totally overkill - instead it would be of somewhat more gain, to reduce
the size of the file in several steps, hoping that the underlying file-system
driver reallocate blocks, so the previous and (hopefully) overwritten blocks
get reallocated and attached to other files during the process, although that
might not be the case...

BTW, it looks like `shred.c' is based on a very early version of ya-wipe v1.0?
This wipe, v2.1, dated in 2001, does use the mersenne-twister for producing
pseudo-random numbers, which is much more compact and easier to read - and
IMHO should be faster than ISAAC...
But then, even for this case, selecting some garbage for overwriting, it
should be sufficient to use, what the C library is already providing.

THX for listening.

CU Tom.
(Thomas M.Ott)
Germany


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: sort --random-sort -- now shred.c

Reply via email to