Hello Paul and others listening, sorry for the somewhat longer delay...
> > /dev/urandom does *not* provide high quality pseudo-random numbers, > > Yes, and thanks for your comments, but the existing substitute doesn't > provide them either, so from the point of view of randomness quality > it wouldn't be a loss to use /dev/urandom if available. That's true too, but it should be sufficient to use rand(), or lrand48(), in case the C library is older (or even random(), if it's much older), in order to shuffle the input lines in random order, while seeded with the current time, so you don't need to provide a pseudo-random number genera= tor either. > An advantage of using /dev/urandom is that, if the lack of quality is > an issue, we can blame the kernel rather than blaming our own code. > I'm half-joking here, but I'm half-serious as well. If coreutils can > avoid the hassle of providing support for reasonably-high-quality > random numbers, then I'd rather go that route. :-)=) But as I've stated above, if it's just needed for shuffling some input lines, IMHO it should be more than sufficient, to use, what the C library already provides, in order to get some randomness, so you could safely eliminate the number generator, thereby reducing maintenance. > > Even if considering `shred' (or the various wipe tools available), there is > > mostly no need for good randomness, or even randomness at all, > > OK, in that case then let's modify 'shred' so that it doesn't rely on > random numbers at all. (It'd make for one less red herring to kill....) After looking at `shred.c', I do understand very well, why you would want the number generator eliminated. But I would like to clarify some things after reading `shred.c', since what has been written inside the info-manual according to `shred' does give some false sense of being sure, the things could get wiped out, although it doesn't, since the source file does reference Gutmann's article. In order not to go into too much detail, the algorithm uses knowledge about the way, *older* hard-drives stored information on disk, namely run-length encoding with a simple error-correction scheme - as a result, Gutmann's algo= rithm overwrites the file in question by reproducing all possible bit combina= tion, as far as the RLE-ECC is concerned, shifted through positions, in order to saturate the magnetic particles, until it is equivalent like "hammering" all-ones onto the disk. This worked well for really old drives - we're talk= ing about 20 year old drives here - but not for new ones. Still, the 10 stages for MFM recording are true for wiping floppy disks, even today, although one should consider it faster and more secure to simply break the floppy disk into peaces, instead of doing 10 passes waiting for minutes. Modern hard-drives (e.g. every disk not older than say 10 years) are EPRML encoded with up to 200 ECC bits, so not only to be able to correct 30+ bytes of burst errors, as additionally the nowadays high capacities couldn't get achieved. For those drives it's said, that up to 2, *maybe* 3 overwriting passes with some different data *may* get recovered under best circumstances; which means, it's nearly absolutely safe to do some say 4-5 passes with some different bit patterns in order to wipe the information away - this could be done by using some randomly selected 4-5 patterns from the 27 pattern pool. But the main problem is not the selected pattern itself - the problem really is getting the data written to the disk. One thing, which isn't clearly stated inside the manual, is the fact, that even when syncing the data to disks while using a journaling file-system, the driver states a successful write operation, if the data has been physically written to the journal. If you start a second write operation, while the file-system is busy, which is naturally given for a multi-user system, the file-system driver may not find the time to synchronize the log-entry with the physical position of the file in question. If now arrives the second write operation with the next pattern, only the old journal entry gets up= dated and not the file itself. This means, that it's mostly *not* the case, that even a synchronized write operation on a journaling file-system really succeeds, which further means, that the journaling function for this oper= ation *needs* to be turned-off. But even if the file-system is not a journaling one, it's very difficult to get the various passes written to disk in order. I'm now only talking about SCSI disks, since I do know nothing about EIDE-drives, but most of the things will be the same for them. Typical SCSI disks, desktop- or server-variants, do have 8 MB of cache RAM located inside the disks - server-only SCSI disks are available with 16 MB of cache RAM. If we think about having some secure data, which must be wiped-out, we talk about at most 4 kB, so it's never the case, that the wiping process does overflow the drive-intern caches. Furthermore do SCSI disks operate with command queuing, which means, the drives will always re= arrange the SCSI commands at will (actually it depends, where the heads are and what read/write operations are outstanding). So in order to wipe some file located on a SCSI disk, one needs to perform the following tasks: · deactivate the write cache, which slows down write operations heavily (surely it needs to be taken into account, that the drive's firmware is buggy and doesn't honor this request, or only honor it after a spin-down) · activate the forced-media-access bit, which degrades performance *heavily* it must be noted here, that only deactivating write caching does not ensure, things are get written when requested, so this is the most important option needed... · deactivate command linking, if available and modifyable · deactivate command reordering (or tagged-command-queuing) All those modifications regarding to the drives mode-pages can only be done by the super-user, but only if the underlying SCSI-driver does support those operations; besides one must be *very* carefully, since changing a different bit might break the drives operation, since many drives do use some reserved bits for manufacturer-dependant actions, e.g. my 3 IBM drives are supporting a mode called `auto-maintenance-mode' through (undocumented) reserved bits, which means that they do a spin-down/spin-up cycle every 6-7 days, thereby moving their heads into the parking zone and cleaning them, so they get not destroyed, while running for months continously, even though those disks are desktop-variants. Another problem will be dynamically attached sectors, sometimes called notched pages; the drive might relink on the fly sectors on long writes internally, without overwriting the old sector; instead this sector will be marked free only, so nothing gets overwritten. (notice, that I'm *not* talking about re= mapping defective sectors here...) Now, one could think, that a low-level format might erase the former contents. This is wrong - the drive only arranges the sectors for the controllers spe= cific interleaving and nothing more (I know this, since I've repartioned an older hard-disk of mine after low-level formatting and accessed it without creating a file-system again - and was very surprised to find all my stuff again...). Only defective sectors will be remapped through the low-level formatting process, as otherwise it would take very much longer - but to be precise here, some newer SCSI-3 disks might provide a way to really over= write the previous contents... So the question is - what's `shred' good for? It might give someone a good feeling of being sure, that some sensitive data has been erased - or not - but that's all. ;-) Just as an example, considerung an actual SCSI drive with 76 GB capacity and some 3-4 kB of sensitive data overwritten with some junk - it'll surely be a *very* time-consuming task for someone locating exactly those 4 kB out of the 76 GB and trying to fetch the previous data written there... As far as the renaming operation is concerned, personally I think that this is totally overkill - instead it would be of somewhat more gain, to reduce the size of the file in several steps, hoping that the underlying file-system driver reallocate blocks, so the previous and (hopefully) overwritten blocks get reallocated and attached to other files during the process, although that might not be the case... BTW, it looks like `shred.c' is based on a very early version of ya-wipe v1.0? This wipe, v2.1, dated in 2001, does use the mersenne-twister for producing pseudo-random numbers, which is much more compact and easier to read - and IMHO should be faster than ISAAC... But then, even for this case, selecting some garbage for overwriting, it should be sufficient to use, what the C library is already providing. THX for listening. CU Tom. (Thomas M.Ott) Germany _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils