On 2020-11-11 00:39, I. Hope Nothing wrote:
Hello all,
I have a large (183 GB) .tar file that has become corrupted. This is
actually the _secondary_ backup of this data. The primary backup (a
USB HDD) was lost, so I was disappointed to find that _this_ backup
isn't easily accessible.
From inspection and memory, it seems that this .tar file was corrupted
by a poorly invoked file transfer operation, e.g., FTP with mixed up
ASCII/binary settings. Each line ends with '^M' before the '\n', and
because this tarball has a lot of binary data in it `dos2unix -f` is
unlikely to restore all occurrences of mangled line endings.
The first line of the .tar file is "Password:", and I can think of
several possibilities as to how this could have happened.
I have made a copy of the file to perform surgery on it.
Unsurprisingly, the results of `dos2unix -f corrupted_tar_file.tar`
crash out after only a couple of dozen entries when listing: `tar tvf
corrupted_tar_file_unix_eol.tar`.
There's a lot of binary data I want to keep on here. I am willing and
keen to learn how to forensically retrieve my data, and I would
greatly appreciate any help pointing me in the right direction. Thank
you for reading this far already!!
If you need transcripts of anything please let me know!!
This is simple hints for attempting manual rescue.
1. If possible, obtain a less corrupted copy of the tar file.
For example, if it was corrupted when extracting it from a tape
over ssh or rlogin, try extracting it again using a binary-safe
protocol. Similarly if it was corrupted after decompressing with
gzip, bzip2 or any other such tool, try decompressing again.
2. Try to obtain a dos2unix implementation that doesn't try to be
"smart", basically, you need to do a binary search replace from
\r\n to \n while leaving alone any other bytes with the value 13.
This will still loose any \r\n sequence that was in the original
data, but there will probably be less corruption than in the file
that was erroneously subjected to the opposite search replace.
3. Look up the tar file format specifications, it is actually a
relatively simple file format and you will need to understand it
to do the manual data rescue. In particular, you will need to
understand the PAX and GNU extensions to the format.
4. Using a binary file viewer, look for the tar header that marks
the start of a much wanted file. Then look for the tar header
of the next file in the archive. The bytes between the two
headers are supposed to be your file contents and the header
before the contents should give the number of bytes in the
uncorrupted file. If you did step 2 above, the actual data
will probably be slightly too short due to too many removed \r
characters, or due to the terminal protocol also removing some
other bytes.
5. Use knowledge of your actual file format to figure out where
an \r was probably lost and use the correct file length from
the tar header as a cross check of your efforts.
6. Repeat steps 4 and 5 for each file.
Good luck, you will need it.
Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded