Even though I have initially slammed the idea of skipping "whitespace",
I have thought more about it, and will provide a possible theory of
operation, if it were to ever be implemented. Although I still say it is
difficult to implement, and would only be feasible for certain situations.
The definition of whitespace would be areas filled completely with
zeros. Meaning the entire cluster being read must be processed to see if
any bytes were not zero. If a non-zero byte is found, then the
processing of that cluster stops, and the cluster is considered used.
But if it is all zeros, then it is considered whitespace. This would add
some overhead to the program, although it is unclear how much it would
affect performance.
Once it is determined that a number of zero filled clusters have been
read in a row, it could trigger a form of skipping. The skipping would
end and be reset once a cluster was found that was non-zero. How much to
skip is the question, as you are skipping for a different reason than a
bad spot, so you don't want to get crazy with the skipping. It must be
reasonably limited. The data could be read backwards after data was
found, or maybe a reverse pass would be better.
That all sounds great, until you try to implement it alongside with the
normal skipping algorithm of bad blocks. It suddenly gets very
complicated, as you have to try to figure out what to do when you have
both bad blocks and whitespace. Also, it must be decided what size
dictates possible whitespace. If you based it on a number of empty
clusters, what happens when the user changes the cluster size to 1? That
could cause premature skipping, so there would need to be a size value
provided to base skipping on. And do you keep separate track of areas
skipped because of bad/slow blocks and areas skipped due to suspected
whitespace? If so, how is that best processed in further passes?
And all of this is based on assuming that large chunks of zeros are
actually unused space. While this is most likely true, it cannot always
be assumed. And this would most likely work best on large drives that
only had a small percent of space used. With modern drive size growing,
I guess this condition is more likely to happen. There could/would be
large areas of the drive that have not been written to since the drive
has been in use. Filesystems can tend to clump things together, but
there is no guarantee that you would not skip good data. But then the
point could be made that you can skip good data when skipping due to bad
blocks.
So is this a good idea? I don't know. It is like a poor man's version of
processing the filesystem. My initial instinct is that it is not the
best idea, but I guess it could work in some cases if done right.
Regards,
Scott
On 1/27/2017 2:47 PM, Antonio Diaz Diaz wrote:
Thanks to all for the feedback.
I tend to agree with Scott in that skipping unused space can't
possibly work with any sort of consistency. Therefore I'll forget
about it until someone shows with data that it can be useful. For
example showing a correspondence between unused sectors and sectors
containing the empty pattern, plus a bitmap showing that the used
sectors are grouped. If the used sectors are scattered, then finding
them is, as Scott said, like playing roulette.
Thanks,
Antonio.
_______________________________________________
Bug-ddrescue mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-ddrescue
_______________________________________________
Bug-ddrescue mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-ddrescue