Dear Jakob,

On Wed, 11 Nov 2020 at 20:03, Jakob Bohm <[email protected]> wrote:

> On 2020-11-11 00:39, I. Hope Nothing wrote:
> > I have a large (183 GB) .tar file that has become corrupted.
> > [...]
>
> There's a lot of binary data I want to keep on here.  I am willing and
> > keen to learn how to forensically retrieve my data, and I would
> > greatly appreciate any help pointing me in the right direction.  Thank
> > you for reading this far already!!
>
> This is simple hints for attempting manual rescue.


Thank you for your answer already.  I have been thinking about what you
wrote, and other things that came to mind even before I posted.  I am well
aware of the magnitude of this complexity of this task.

In the best case scenario I have been hoping that whatever solution I come
up with could be made into a generalized "damaged .tar file fixer upper".


> 1. If possible, obtain a less corrupted copy of the tar file.
>    For example, if it was corrupted when extracting it from a tape
>    over ssh or rlogin, try extracting it again using a binary-safe
>    protocol.  Similarly if it was corrupted after decompressing with
>    gzip, bzip2 or any other such tool, try decompressing again.
>

This is not possible, unfortunately :-(


> 2. Try to obtain a dos2unix implementation that doesn't try to be
>    "smart", basically, you need to do a binary search replace from
>    \r\n to \n while leaving alone any other bytes with the value 13.
>

The implementation I have is from
http://waterlan.home.xs4all.nl/dos2unix.html (according to the man page).


>    This will still loose any \r\n sequence that was in the original
>    data, but there will probably be less corruption than in the file
>    that was erroneously subjected to the opposite search replace.
>

Sure.  Assuming that what has happened is what I think probably did
happen...

I think that line endings got mangled during a botched FTP transfer.  More
worryingly, for some reason there is also a "Password:" prompt as Line 1,
which concerns me because I wonder if there is more to the damage than
simply mangled line endings, e.g., perhaps STDOUT or STDERR got redirected
somewhere it shouldn't have.

Please correct me if I'm wrong, but in the simplest assumed case, the one
potentially irreversibly mangled case is where there was a 13 ('\r') that
was NOT at the end of a text line but which was part of binary data and
therefore converted to 13 10 ('\r' '\n').  Here I need to implement my own
forensic logic, possibly based on probabilistic methods, that this gets
this converted back to plan 10 ('\n') where the correctness of this
conversion is judged by:

-   the .tar file listing its contents without error;
-   extracting without error, and;
-   the file where that conversion-and-then-reverse-conversion is located
is apparently functioning properly after extraction (there are many ways
this can be tested).

3. Look up the tar file format specifications, it is actually a
>    relatively simple file format and you will need to understand it
>    to do the manual data rescue.  In particular, you will need to
>    understand the PAX and GNU extensions to the format.
>

This is one of the first things that came to my mind.  So far I know of the
following sources of information:

-   The GNU Tar documentation and source code
-   Schily's star documentation and source code

**If you know of any other source code of specifications I should be aware
of, please let me know.**

4. Using a binary file viewer, look for the tar header that marks
>    the start of a much wanted file.  Then look for the tar header
>    of the next file in the archive.  The bytes between the two
>    headers are supposed to be your file contents and the header
>    before the contents should give the number of bytes in the
>    uncorrupted file.  If you did step 2 above, the actual data
>    will probably be slightly too short due to too many removed \r
>    characters, or due to the terminal protocol also removing some
>    other bytes.
>

What binary file viewers do you recommend?

Up until now the only binary file viewers I've used were `od` and
`hexl-mode` in Emacs, and casually at that.  If you have better
suggestions, I'd appreciate it!!


> 5. Use knowledge of your actual file format to figure out where
>    an \r was probably lost and use the correct file length from
>    the tar header as a cross check of your efforts.
>

See my reasoning above when checking sanity checking the results of 2.


> 6. Repeat steps 4 and 5 for each file.
>

Yes.

Good luck, you will need it.
>

Thanks again Jakob.  As I mentioned before, I'm hoping that something
positive can come out of this forensic work.

Kind regards,

I. Hope

Reply via email to