Re: Successful data recovery (was Re: File signatures??)

John Ralls Sat, 01 Jul 2017 07:59:43 -0700

Matt,

I’m glad that you managed to recover your data.


Now go set up a proper backup program so that you never have to go through that 
again. Do it today!

Regards,
John Ralls


> On Jun 30, 2017, at 2:01 PM, [email protected] wrote:
> 
> Max, all,
> 
> Thank you for the pointers and help.  I'm pleased to say that I seem to have 
> recovered my data.  Still remaining to be found would be customizations I 
> made to the standard report.  But if what I have found stands up to some 
> checks against known bank balances, etc, then I won't be too far off of where 
> I was a month ago.
> 
> What I have been trying to sift through (to recap a bit) is the result of a 
> recovery done with testdisk/photorec, which left a blizzard of files and file 
> fragments on a multi-terabyte harddrive.  By and large the filenames were 
> lost (though not in all cases) and photorec uses a list of known file 
> signatures to try to append the appropriate file extension.  This largely 
> works, but also not always.  Finally, if I had known of a definitive file 
> signature *before* I started the recovery, that might have helped.  But for 
> text-oriented files (vs JPEGs, PDFs, executables, etc) that's not always 
> reliable or available.
> 
> Fortunately, photorec seems to recognize XML and xml.gz formatted files.  
> Diving head first into a pool I hadn't been in before, I came up with bash 
> scripts (this is a Linux machine I'm working on) to do recursive searches.  
> Basically, I would open a terminal window and
>  gedit ~/.bashrc
> 
> and add the following to the end:
> 
> 
> function odsgrep(){
> term="$1"
> echo Start search : $term
> OIFS="$IFS"
> IFS=$'\n'
> for file in $(find . -name "*.ods"); do
>    echo $file;
>    unzip -p "$file" content.xml | tidy -q -xml 2> /dev/null | grep -i -F 
> "$term" > /dev/null;
>    if [ $? -eq 0 ]; then
>       echo FOUND FILE $file;
>       echo $file;
>    fi;
> done
> IFS="$OIFS"
> echo Finished search : $term
> }
> 
> function mattpdfgrep(){
> term="$1"
> echo Start search : $term
> OIFS="$IFS"
> IFS=$'\n'
> for file in $(find . -name "*.pdf"); do
>    #echo $file;
>    pdftotext -htmlmeta "$file" - | grep --with-filename --label="$file" 
> --color -i -F "$term" ;
>    if [ $? -eq 0 ]; then
>      echo $file;
>      pdfinfo $file;
>    fi;
> done
> IFS="$OIFS"
> echo Finished search : $term
> }
> 
> function mattxlsgrep(){
> term="$1"
> echo Start search : $term
> OIFS="$IFS"
> IFS=$'\n'
> for file in $(find . -name "*.xlsx"); do
>    #echo $file;
>    xlsx2csv "$file" | grep --with-filename --label="$file" --color -i -F 
> "$term" ;
>    if [ $? -eq 0 ]; then
>      echo $file;
>    fi;
> done
> for file in $(find . -name "*.xls"); do
>    #echo $file;
>    xls2csv "$file" | grep --with-filename --label="$file" --color -i -F 
> "$term" ;
>    if [ $? -eq 0 ]; then
>      echo $file;
>    fi;
> done
> IFS="$OIFS"
> echo Finished search : $term
> }
> 
> function mattxmlgzgrep(){
> term="$1"
> echo Start search : $term
> OIFS="$IFS"
> IFS=$'\n'
> for file in $(find . -name "*.xml.gz"); do
>    #echo $file;
>    gunzip -c "$file" | tidy -q -xml 2> /dev/null | grep -i -F "$term" > 
> /dev/null;
>    if [ $? -eq 0 ]; then
>       echo FOUND FILE $file;
>       #echo $file;
>    fi;
> done
> IFS="$OIFS"
> echo Finished search : $term
> }
> 
> function matttxtgrep(){
> term="$1"
> echo Start search : $term
> OIFS="$IFS"
> IFS=$'\n'
> for file in $(find . -name "*.txt"); do
>    #echo $file;
>    grep -i -F "$term" "$file"> /dev/null;
>    if [ $? -eq 0 ]; then
>       echo FOUND FILE $file;
>       #echo $file;
>    fi;
> done
> IFS="$OIFS"
> echo Finished search : $term
> }
> 
> These custom commands (built from a 'net search that turned up a variant of 
> the first one) allow for recursive file searches as well as subsequent 
> unzipping and string search operations.  Importantly, they attempt to look 
> inside of spreadsheets and pdfs which aren't otherwise "grep-able".
> 
> To find the data, I used the mattxmlgzgrep routine to search *backwards* in 
> time for the following
>  <ts:date>2017-06
> 
> It found no files, which was expected, since I had last worked on this 
> account in March or April, around US tax season.  The next search for
>  <ts:date>2017-05
> also turned up nothing.  But searching <ts:date>2017-04 turned up 1 hit and 
> <ts:date>2017-03 turned up a large number.  So even though the timestamp on 
> the file was dated as of the recovery, by searching backwards for entries I 
> was able to narrow things down.
> 
> Examining the file in gnucash (it seemed to have been pulled in cleanly) 
> showed all the categories, accounts, data, etc that I expected to see.
> 
> It would be great to find the files related to the standard report 
> customizations and I'll spend a little time trying to do that.  Not sure what 
> would be a suitable "marker" yet but I think I have a candidate or two.  But 
> after that I need to find the other records that made up some of this 
> workflow.  Fortunately, they were all digital to begin with and I believe I 
> still have access online.
> 
> Thanks again for anyone's help.  If there's anything I can share in return, 
> let me know.
> 
> Matt
> 
> 
> On 2017-06-30 14:06, [email protected] wrote:
>> Dear Matt:
>>> The problem is that the recovery operation (using
>>> Testdisk/Photorec) results in files and file fragments
>>> that may or may not be correctly identified by file
>>> extensions.
>>   It sounds like what you want is a magic number (file-format ID:
>> https://en.wikipedia.org/wiki/File_format#Magic_number) for .gnucash
>> files.  Looking at my file it appears that ``<gnc-v2'' starting at the
>> 41st character in the file whould do it.  (I presumee the `2' in
>> ``-v2'' is a version number, and could change at some future date, but
>> for now that's not a problem.)
>>   It would be nice if the recovery program lets you add to the
>> file-ID list, otherwise you're back to grep.  I hope that it
>> recognizes gzipped files (possible GNUCash files, compressed), but if
>> not you want to look for the first two characters = 0x1f 0x8b.  Of
>> course, then you'll have to unzip them to see whether they're really
>> what you want.  :-/
>>   Gurus:  Is this right?  For future-proofing, can we assume the
>> magic number will always be in position 41?  Is there an actual,
>> designated, magic number for GNUCash files somewhere?
>>   Hope this makes sense/helps...
>>       Best wishes,
>>           Max Hyre
> _______________________________________________
> gnucash-user mailing list
> [email protected]
> https://lists.gnucash.org/mailman/listinfo/gnucash-user
> -----
> Please remember to CC this list on all your replies.
> You can do this by using Reply-To-List or Reply-All.

_______________________________________________
gnucash-user mailing list
[email protected]
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.

Re: Successful data recovery (was Re: File signatures??)

Reply via email to