I wrote some advanced information on ddrescue in a forum a few years
back. I think I will direct post here, breaking it up into a couple
parts as I did when I originally posted it. I mention how to attempt to
skip out of a bad head reasonably. I am just going to copy and paste
from my original Word documents, so the formatting may not be the best.
Part 1
Ddrescue: Advanced Understanding
This thread is meant to be a place to discuss gnuddrescue, both how it
works and how to use it to its full potential. I will be adding things
to this in an ongoing process. There is way too much to discuss in
just one post (or even a few posts).
First, an explanation of what ddrescue is: Ddresuce is a free open
source disk cloning software. Its purpose is to copy data from a
failing drive. It does this at the sector level. It has an algorithm
that does the best it can to get the most recoverable data first
before trying really hard at the bad areas. In my opinion, it is the
best freeware option to do this.
What ddrescue does not do: It does not recover specific files. It
doesn’t care what the file system is. It just copies data at the
sector level. So in no way does it process files. It only processes
the raw drive. Ddrescue also does not use any direct disk commands. It
uses generic read commands, which allows it to be compiled and run on
different posix based systems. I do have a patch for it that will
allow the use of ATA passthrough commands on Linux, but that will be
discussed later.
Now let’s take a look at the algorithm. I am going to focus on the
most current version, which at the time of this writing is 1.19. I
feel that 1.19 is far better than previous versions, and the previous
versions to not have this same algorithm. There are three phases of
the recovery: Copy phase, trimming phase, and scraping phase. The copy
phase itself does three passes. The first pass is forwards. If you
just run a default command such a “ddrescue /dev/sda image.dd
image.log” it will read the default of 128 sectors at a time (65536
bytes). When it finds a drive error, it will mark that block as
non-trimmed, skip the next 65536 bytes (by default) which is marked as
non-tried, and then attempts to continue reading. If the next read is
also bad, the skip size is doubled. The skip size will keep doubling
until it hits the max of 1GB or 1% of the drive size, whichever is
lowest. When it reaches the end of the drive, it will then do the same
thing backwards (pass 2), reading only the areas that were marked as
non-tried (skipped).
Before we get into copy pass 3, let’s look at the first two passes.
The first pass is designed to skip out of bad areas as fast as
possible. However, as the skip size grows it is possible to skip past
a big chuck of good data before it starts reading again. As the second
pass does the same thing only backwards, it should normally catch most
of the good data that was at the end of bad areas from the first pass.
You may notice that the reverse reads are much slower than the forward
reads. This is because normally drives have a look-ahead feature that
will read ahead and store the data internally in a buffer. This only
works when reading forwards. If you send a special command to the
drive to turn off this feature, you will find the forward and reverse
reads will be at about the same speed.
Now it would help to understand how the data is stored on the
platters. A typical disk can have between 1 and 4 platters, and 2 to 8
heads. The data is actually stored in small groups that could be 100MB
or less up to 1GB or more, depending on the drive. So for example if
the group size was exactly 100MB, then on a 2 platter 4 head drive the
first 0-100MB would be read from head 1, 100-200MB from head 2,
200-300MB from head 3, 300-400MB from head 4. Then the next 400-500MB
would go back to head 1, and so on. So as you see, the data is not all
in strait line order. There are normally two basic hard drive errors
(ones that can be worked with using ddrescue). The first is a damaged
area on one of the platters. The size of this error can vary, and the
error can span multiple groups on the head. A damaged platter can also
cause head damage (or further head damage) when the head passes over
it. The less time spent in this area the better. The second common
error is a weak or damaged head. This will affect reads across the
entire disk. I have seen more than one logfile that shows this. There
are usually many small errors spaced a bit apart, and usually there is
also somewhat of a pattern (that can only be seen by examining the
logfile). You can use ddrescueview to see a visual reference of the
errors caused by the bad head, and you can also use it to get an idea
of the group size of the head.
So how can we best deal with this? I like to think that the
skip-out-fast method would usually be the best. This method involves
using the --skip-size option to set both the skip size and the max
skip size. By default the skip size is 64KiB and the max is either
1GiB or 1% of the drive size, whichever is smaller. So for example if
we use ddrescueview (or examine the logfile) to see the error pattern
early on in the rescue to get an estimate that the data group size is
about 100MB, then we might want to go with something like a 5Mi skip
size with a 10Mi max ("--skip-size=5Mi,10Mi"). We want to keep
skipping out of the bad head as fast as possible on the first pass,
but don't want to skip way too far out if we can help it. The untried
area that is skipped out away from the bad head will get processed by
the reverse pass (a good benefit of the reverse pass). This means that
we can skip out big and fast if wanted, but understand that reverse
reads are usually slower than forward reads. And you also don't want
to allow skipping more than half way to the next bad read, or good
data could be missed on the reverse pass and would have to wait for
the third no-skip pass. The skip out fast method will also work for a
damaged area on the platter, although you will likely not know in
advance the group size. The big benefit of this method is getting the
most good data as fast as possible before working on the problem areas.
We have only covered the first two copy passes, but that is enough for
the first post (I am losing focus). More to come soon…
On 1/4/2018 6:02 AM, Peter Clifton wrote:
Hi,
I've been dumping a disk with ddrescue for a friend, and it occurred to me that
one feature present in hardware based / proprietary recovery tools (as far as I
could discern from watching youtube videos of professional recovery), is
bad-head mapping.
The pattern of slow / bad reads from this particular disk appears to be 75%
good, 25% bad, in a fairly regular pattern. I know the disk has 2x platters, 4x
heads, so this suggests (possibly), a damaged region of one platter face, or
one read head wearing or damaged more significantly than the others.
I was curious as to whether you had suggestion how (or interest in adding a
feature), to have ddrescue focus on the 3/4 of the disk which is more readily
accessible.
_______________________________________________
Bug-ddrescue mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-ddrescue