Re: [Bug-ddrescue] Suggestion / feature request - bad head mapping

Scott Dwyer Thu, 04 Jan 2018 14:42:46 -0800

I wrote some advanced information on ddrescue in a forum a few yearsback. I think I will direct post here, breaking it up into a coupleparts as I did when I originally posted it. I mention how to attempt toskip out of a bad head reasonably. I am just going to copy and pastefrom my original Word documents, so the formatting may not be the best.


Part 1

Ddrescue: Advanced Understanding
This thread is meant to be a place to discuss gnuddrescue, both how itworks and how to use it to its full potential. I will be adding thingsto this in an ongoing process. There is way too much to discuss injust one post (or even a few posts).
First, an explanation of what ddrescue is: Ddresuce is a free opensource disk cloning software. Its purpose is to copy data from afailing drive. It does this at the sector level. It has an algorithmthat does the best it can to get the most recoverable data firstbefore trying really hard at the bad areas. In my opinion, it is thebest freeware option to do this.
What ddrescue does not do: It does not recover specific files. Itdoesn’t care what the file system is. It just copies data at thesector level. So in no way does it process files. It only processesthe raw drive. Ddrescue also does not use any direct disk commands. Ituses generic read commands, which allows it to be compiled and run ondifferent posix based systems. I do have a patch for it that willallow the use of ATA passthrough commands on Linux, but that will bediscussed later.
Now let’s take a look at the algorithm. I am going to focus on themost current version, which at the time of this writing is 1.19. Ifeel that 1.19 is far better than previous versions, and the previousversions to not have this same algorithm. There are three phases ofthe recovery: Copy phase, trimming phase, and scraping phase. The copyphase itself does three passes. The first pass is forwards. If youjust run a default command such a “ddrescue /dev/sda image.ddimage.log” it will read the default of 128 sectors at a time (65536bytes). When it finds a drive error, it will mark that block asnon-trimmed, skip the next 65536 bytes (by default) which is marked asnon-tried, and then attempts to continue reading. If the next read isalso bad, the skip size is doubled. The skip size will keep doublinguntil it hits the max of 1GB or 1% of the drive size, whichever islowest. When it reaches the end of the drive, it will then do the samething backwards (pass 2), reading only the areas that were marked asnon-tried (skipped).
Before we get into copy pass 3, let’s look at the first two passes.The first pass is designed to skip out of bad areas as fast aspossible. However, as the skip size grows it is possible to skip pasta big chuck of good data before it starts reading again. As the secondpass does the same thing only backwards, it should normally catch mostof the good data that was at the end of bad areas from the first pass.You may notice that the reverse reads are much slower than the forwardreads. This is because normally drives have a look-ahead feature thatwill read ahead and store the data internally in a buffer. This onlyworks when reading forwards. If you send a special command to thedrive to turn off this feature, you will find the forward and reversereads will be at about the same speed.
Now it would help to understand how the data is stored on theplatters. A typical disk can have between 1 and 4 platters, and 2 to 8heads. The data is actually stored in small groups that could be 100MBor less up to 1GB or more, depending on the drive. So for example ifthe group size was exactly 100MB, then on a 2 platter 4 head drive thefirst 0-100MB would be read from head 1, 100-200MB from head 2,200-300MB from head 3, 300-400MB from head 4. Then the next 400-500MBwould go back to head 1, and so on. So as you see, the data is not allin strait line order. There are normally two basic hard drive errors(ones that can be worked with using ddrescue). The first is a damagedarea on one of the platters. The size of this error can vary, and theerror can span multiple groups on the head. A damaged platter can alsocause head damage (or further head damage) when the head passes overit. The less time spent in this area the better. The second commonerror is a weak or damaged head. This will affect reads across theentire disk. I have seen more than one logfile that shows this. Thereare usually many small errors spaced a bit apart, and usually there isalso somewhat of a pattern (that can only be seen by examining thelogfile). You can use ddrescueview to see a visual reference of theerrors caused by the bad head, and you can also use it to get an ideaof the group size of the head.
So how can we best deal with this? I like to think that theskip-out-fast method would usually be the best. This method involvesusing the --skip-size option to set both the skip size and the maxskip size. By default the skip size is 64KiB and the max is either1GiB or 1% of the drive size, whichever is smaller. So for example ifwe use ddrescueview (or examine the logfile) to see the error patternearly on in the rescue to get an estimate that the data group size isabout 100MB, then we might want to go with something like a 5Mi skipsize with a 10Mi max ("--skip-size=5Mi,10Mi"). We want to keepskipping out of the bad head as fast as possible on the first pass,but don't want to skip way too far out if we can help it. The untriedarea that is skipped out away from the bad head will get processed bythe reverse pass (a good benefit of the reverse pass). This means thatwe can skip out big and fast if wanted, but understand that reversereads are usually slower than forward reads. And you also don't wantto allow skipping more than half way to the next bad read, or gooddata could be missed on the reverse pass and would have to wait forthe third no-skip pass. The skip out fast method will also work for adamaged area on the platter, although you will likely not know inadvance the group size. The big benefit of this method is getting themost good data as fast as possible before working on the problem areas.
We have only covered the first two copy passes, but that is enough forthe first post (I am losing focus). More to come soon…



On 1/4/2018 6:02 AM, Peter Clifton wrote:

Hi,

I've been dumping a disk with ddrescue for a friend, and it occurred to me that 
one feature present in hardware based / proprietary recovery tools (as far as I 
could discern from watching youtube videos of professional recovery), is 
bad-head mapping.

The pattern of slow / bad reads from this particular disk appears to be 75% 
good, 25% bad, in a fairly regular pattern. I know the disk has 2x platters, 4x 
heads, so this suggests (possibly), a damaged region of one platter face, or 
one read head wearing or damaged more significantly than the others.

I was curious as to whether you had suggestion how (or interest in adding a 
feature), to have ddrescue focus on the 3/4 of the disk which is more readily 
accessible.


_______________________________________________
Bug-ddrescue mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-ddrescue

Re: [Bug-ddrescue] Suggestion / feature request - bad head mapping

Reply via email to