[Bug-ddrescue] Suggestion to improve recovery rate during splitting phase.

kwb78 Sun, 20 Jan 2013 13:07:36 -0800

I have been recovering data from a hard drive which has been taking a large
amount of time to complete. I have been looking at the way in which ddrescue
attempts to read data to see why this is. I think that in a situation where
the drive has many bad sectors across the whole disk, as opposed to
contained within a few areas, ddrescue does not approach the problem
optimally.

If I have understood the way the splitting phase occurs at present, the
drive is read sequentially starting from the first unsplit area forwards
until a certain number of bad sectors have been encountered and then it
jumps an arbitrary distance ahead. It the repeats this, gradually breaking
down the unsplit areas until it has read every sector and recovered the data
or marked it bad.

When there are only a few areas of bad sectors this approach works quite
well, but with larger numbers of bad sectors it is painfully slow. The
reason for this is the time penalty for reading a bad sector can be of the
order of seconds for each one. When it is attempting to read 8 or more
consecutive sectors before skipping, this means that it can spend a minute
or more between skips.

>From looking at the way bad sectors and errors seem to appear on drives, I
have made a couple of observations.

1. The number of bad sectors on even a quite badly damaged drive is often
very small in comparison to the number of good sectors.

2. The good sectors tend to be in contiguous areas.

3. The bad sectors also tend to be in contiguous (although smaller) areas.

>From this, I think the following assumptions can be made:

If you choose any particular good sector, the probability is that is will be
next to another good sector.

If you choose any bad sector, the probability is that is will be next to
another bad sector.

Minimising the attempts at reading bad sectors will greatly reduce the time
take to recover the majority of the good data.

My suggested algorithm is as follows:

Following trimming,

1. Examine the log file to locate the largest unsplit area on the disk that
is directly adjacent to a known good area.

2. Begin reading the unsplit area in the direction away from the known good
area.

3. Upon encountering 2 bad sectors in a row, stop (since the probability is
that the next sector will also be bad).

4. Reread the logfile to determine the next largest unsplit area adjacent to
a known good area and go back to step 2.

5. When there are no remaining unsplit areas next to good areas, choose the
largest unsplit area and begin reading from the middle, not the edge.

6. Keep doing this until the unsplit areas are all below an arbitrary
minimum size, at which point go back to reading linearly.

This approach won't reduce the time taken to exhaustively recover a drive to
the last sector, but it would shift the very slow reading of bad sectors
towards the end of the process. It has the advantage that it gets the
maximum amount of data off the drive as quickly as possible.

This is largely conjecture since I am not a programmer, but having done some
limited testing by manually editing the log file to start in different
places it looks like it could give a significant improvement.

--
View this message in context:
http://old.nabble.com/Suggestion-to-improve-recovery-rate-during-splitting-phase.-tp34924021p34924021.html
Sent from the Gnu - ddrescue mailing list archive at Nabble.com.

_______________________________________________
Bug-ddrescue mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-ddrescue

[Bug-ddrescue] Suggestion to improve recovery rate during splitting phase.

Reply via email to