Matt, I’m glad that you managed to recover your data.
Now go set up a proper backup program so that you never have to go through that again. Do it today! Regards, John Ralls > On Jun 30, 2017, at 2:01 PM, [email protected] wrote: > > Max, all, > > Thank you for the pointers and help. I'm pleased to say that I seem to have > recovered my data. Still remaining to be found would be customizations I > made to the standard report. But if what I have found stands up to some > checks against known bank balances, etc, then I won't be too far off of where > I was a month ago. > > What I have been trying to sift through (to recap a bit) is the result of a > recovery done with testdisk/photorec, which left a blizzard of files and file > fragments on a multi-terabyte harddrive. By and large the filenames were > lost (though not in all cases) and photorec uses a list of known file > signatures to try to append the appropriate file extension. This largely > works, but also not always. Finally, if I had known of a definitive file > signature *before* I started the recovery, that might have helped. But for > text-oriented files (vs JPEGs, PDFs, executables, etc) that's not always > reliable or available. > > Fortunately, photorec seems to recognize XML and xml.gz formatted files. > Diving head first into a pool I hadn't been in before, I came up with bash > scripts (this is a Linux machine I'm working on) to do recursive searches. > Basically, I would open a terminal window and > gedit ~/.bashrc > > and add the following to the end: > > > function odsgrep(){ > term="$1" > echo Start search : $term > OIFS="$IFS" > IFS=$'\n' > for file in $(find . -name "*.ods"); do > echo $file; > unzip -p "$file" content.xml | tidy -q -xml 2> /dev/null | grep -i -F > "$term" > /dev/null; > if [ $? -eq 0 ]; then > echo FOUND FILE $file; > echo $file; > fi; > done > IFS="$OIFS" > echo Finished search : $term > } > > function mattpdfgrep(){ > term="$1" > echo Start search : $term > OIFS="$IFS" > IFS=$'\n' > for file in $(find . -name "*.pdf"); do > #echo $file; > pdftotext -htmlmeta "$file" - | grep --with-filename --label="$file" > --color -i -F "$term" ; > if [ $? -eq 0 ]; then > echo $file; > pdfinfo $file; > fi; > done > IFS="$OIFS" > echo Finished search : $term > } > > function mattxlsgrep(){ > term="$1" > echo Start search : $term > OIFS="$IFS" > IFS=$'\n' > for file in $(find . -name "*.xlsx"); do > #echo $file; > xlsx2csv "$file" | grep --with-filename --label="$file" --color -i -F > "$term" ; > if [ $? -eq 0 ]; then > echo $file; > fi; > done > for file in $(find . -name "*.xls"); do > #echo $file; > xls2csv "$file" | grep --with-filename --label="$file" --color -i -F > "$term" ; > if [ $? -eq 0 ]; then > echo $file; > fi; > done > IFS="$OIFS" > echo Finished search : $term > } > > function mattxmlgzgrep(){ > term="$1" > echo Start search : $term > OIFS="$IFS" > IFS=$'\n' > for file in $(find . -name "*.xml.gz"); do > #echo $file; > gunzip -c "$file" | tidy -q -xml 2> /dev/null | grep -i -F "$term" > > /dev/null; > if [ $? -eq 0 ]; then > echo FOUND FILE $file; > #echo $file; > fi; > done > IFS="$OIFS" > echo Finished search : $term > } > > function matttxtgrep(){ > term="$1" > echo Start search : $term > OIFS="$IFS" > IFS=$'\n' > for file in $(find . -name "*.txt"); do > #echo $file; > grep -i -F "$term" "$file"> /dev/null; > if [ $? -eq 0 ]; then > echo FOUND FILE $file; > #echo $file; > fi; > done > IFS="$OIFS" > echo Finished search : $term > } > > These custom commands (built from a 'net search that turned up a variant of > the first one) allow for recursive file searches as well as subsequent > unzipping and string search operations. Importantly, they attempt to look > inside of spreadsheets and pdfs which aren't otherwise "grep-able". > > To find the data, I used the mattxmlgzgrep routine to search *backwards* in > time for the following > <ts:date>2017-06 > > It found no files, which was expected, since I had last worked on this > account in March or April, around US tax season. The next search for > <ts:date>2017-05 > also turned up nothing. But searching <ts:date>2017-04 turned up 1 hit and > <ts:date>2017-03 turned up a large number. So even though the timestamp on > the file was dated as of the recovery, by searching backwards for entries I > was able to narrow things down. > > Examining the file in gnucash (it seemed to have been pulled in cleanly) > showed all the categories, accounts, data, etc that I expected to see. > > It would be great to find the files related to the standard report > customizations and I'll spend a little time trying to do that. Not sure what > would be a suitable "marker" yet but I think I have a candidate or two. But > after that I need to find the other records that made up some of this > workflow. Fortunately, they were all digital to begin with and I believe I > still have access online. > > Thanks again for anyone's help. If there's anything I can share in return, > let me know. > > Matt > > > On 2017-06-30 14:06, [email protected] wrote: >> Dear Matt: >>> The problem is that the recovery operation (using >>> Testdisk/Photorec) results in files and file fragments >>> that may or may not be correctly identified by file >>> extensions. >> It sounds like what you want is a magic number (file-format ID: >> https://en.wikipedia.org/wiki/File_format#Magic_number) for .gnucash >> files. Looking at my file it appears that ``<gnc-v2'' starting at the >> 41st character in the file whould do it. (I presumee the `2' in >> ``-v2'' is a version number, and could change at some future date, but >> for now that's not a problem.) >> It would be nice if the recovery program lets you add to the >> file-ID list, otherwise you're back to grep. I hope that it >> recognizes gzipped files (possible GNUCash files, compressed), but if >> not you want to look for the first two characters = 0x1f 0x8b. Of >> course, then you'll have to unzip them to see whether they're really >> what you want. :-/ >> Gurus: Is this right? For future-proofing, can we assume the >> magic number will always be in position 41? Is there an actual, >> designated, magic number for GNUCash files somewhere? >> Hope this makes sense/helps... >> Best wishes, >> Max Hyre > _______________________________________________ > gnucash-user mailing list > [email protected] > https://lists.gnucash.org/mailman/listinfo/gnucash-user > ----- > Please remember to CC this list on all your replies. > You can do this by using Reply-To-List or Reply-All. _______________________________________________ gnucash-user mailing list [email protected] https://lists.gnucash.org/mailman/listinfo/gnucash-user ----- Please remember to CC this list on all your replies. You can do this by using Reply-To-List or Reply-All.
