On 9/27/05, Carl Lowenstein <[EMAIL PROTECTED]> wrote:
> On 9/27/05, m ike <[EMAIL PROTECTED]> wrote:
> > On 9/27/05, Carl Lowenstein <[EMAIL PROTECTED]> wrote:
> > > On 9/27/05, m ike <[EMAIL PROTECTED]> wrote:
> > > > for extracting a portion of a file, the dd command can be hastened
> > > > dramatically (by a factor of 10,000) by changing the to bs=1024
> > > > (for example) and increasing count to be inclusive, and then piping
> > > > the result to head -c to trim it down to exact byte-size.
> > > >
> > > > 10,000 may be an exaggeration. okay it is an exaggeration. but does
> > > > not seem to be far off.
> > >
> > > There are two reasons for using large block sizes in dd.  One is to
> > > eliminate the overhead of e.g. issuing a million system calls each to
> > > read one byte, vs. one system call to read a million bytes.  The other
> > > is to reduce the effect of missing the "next block" in a disk read.
> > > If you have to wait for a whole disk revolution to read a block, your
> > > data transfer slows down proportional to the number of blocks per
> > > cylinder.  Nowadays this can range from 600 at the inner radius to
> > > 1200 at the outer.  (these are real physical blocks, not the fictional
> > > blocks that LBA software uses).
> >
> > fwiw, afaif, when one is grabbing a specific hunk within
> > a file, the largest bs= that can be specified is the greatest
> > common denominator of skip= and count=.
>
> Not if you first grab a large chunk and then skip and count within it
> for a smaller selection.

Exactly!  Sorry for the confusion.  That was my trick in the
initial post :) of this thread ---- except that the first process
_starts_ accurately (but runs too long), and the second
process _ends_ accurately.

fwiw,  here is the updated worksheet.
changes are in the final section.


################################################################
# 2005-09-27
#
# for reasons of speed, I have changed the dissection idiom to:
#  dd | head -c
#
#
# 2005-09-16
# This is a worksheet that I developed to dissect intact all
# 116 jpgs from about 282MB of an accidentally reformatted
# and partially overwritten 512MB FAT16 CF card (Olympus c5050)
#
# The 282MB turned out to be un-fragmented in the sense that
# each JPG resided in a continuous stretch of disk space.
#
# Working in a bash shell, the basic approach is to use sed to
# make the jpg begin/close markers grep-able, then use grep to
# identify their byte-offsets, then use dd to dissect the jpg.
#
########
#
# sed --version
#   GNU sed version 4.1.2
# grep --version
#   grep (GNU grep) 2.5.1
# bash --version
#   GNU bash, version 3.00.0(1)-release (i586-suse-linux)
#
# FF D8 is the beginning marker of a jpg.
# FF D9 is the closing marker of a jpg
# Since each jpg contains an embedded jpg thumbnail, there
# will be nested pairs of markers.
#
# grep's -b option will report the byte-offset of the line
# containing the match, not the offset of the match itself.
#
# This page got me started (thanks TsuruZoh Tachibanaya):
#
#  http://www.media.mit.edu/pia/Research/deepview/exif.html
#
################################################################




##########  MAKE A WORKING COPY OF THE FLASH CARD
##########  THIS STEP IS OPTIONAL

 cat /dev/sda1 > CF




##########  MAKE A SMALLER FILE TO WORK WITH
##########  THIS STEP IS OPTIONAL

# find the byte offset of the first jpg residing
# in the not-overwritten half of the card

# grab lines containing either the begin marker or exif
# date/time info.

 hexdump -C CF | grep -e "\ ff\ d8\ \|[0-9]:[0-9]" > CF_grepped_hexdump

# The hexdump takes a few minutes for 512MB
#
# To locate first deleted jpg, hand search the output paying
# attention to exif date/time strings in the ascii column.
#
# Each line generated by hexdump begins with a hexadecimal
# number that indicates the byte-offset, such as 0d6bf600
#
# the following line converts that number to base-ten

 printf "%d" 0x0d6bf600

# Calculate the number of bytes in the file that
# follow the offset:  filesize - byteoffset

 tail -c 287271936 CF > CF_short

# Verify the short file starts with the jpg marker FF D8

 hexdump -C CF_short | head





##########  MAKE FILES THAT ARE GREP-ABLE for FF D8 and FF D9

# verify that ^BEGIN or ^CLOSE will be unique (grep should
# grep should not find any matches)

 grep ^BEGIN CF_short
 grep ^CLOSE CF_short

# \x0A is the newline character.  Make sure to insert it so
# that in the next step, grep will report the offset of the
# markers.

 cat CF_short | sed -e 's/\xFF\xD8/\x0ABEGIN/' > SED_begin_1
 cat CF_short | sed -e 's/\xFF\xD9/\x0ACLOSE/' > SED_close_1

# Note that sed will replace 2 bytes with 6 bytes. Note also
# that the byte offests for the close markers will indicate
# the beginning of the 2-byte markers, not their ends.  These
# 2 issues will need to be accounted for later.




##########  REMOVE UNIMPORTANT LINES:

# "strings" strips non-printable characters.
# The 2nd grep filters out unwanted lines.
# The 2nd sed leaves only the byte offset.

 grep -a -b ^BEGIN SED_begin_1 | strings | grep BEGIN | sed -e
's#:BEGIN.*$##' > SED_begin_2

 grep -a -b ^CLOSE SED_close_1 | strings | grep CLOSE | sed -e
's#:CLOSE.*$##' > SED_close_2

 wc SED_begin_2
# 232  232 2217 SED_begin_2

 wc SED_close_2
# 2391  2391 23802 SED_close_2

# note the excess number of CLOSEs, presumably left over
# from previous uses of the flash card. these excess
# CLOSEs occur (?) at the end of the CF card, and in the
# not-overwritten space that exists between the jpgs




############  INSPECT THE BYTE-DISTANCE BETWEEN SUCCESSIVE BEGINS
############  THIS STEP IS OPTIONAL

old_offset=0;
n_begin=0;
for i in `cat SED_begin_2`; do
 (( n_begin += 1 ));
 new_offset=${i/\:BEGIN*/};
 distance=$(( new_offset - old_offset ));
# if [ $distance -lt 4096 ]; #then
  printf "%4d "        $n_begin ;
  printf "%10d %10d "  $new_offset  $old_offset;
  printf "%10d\n"      $distance;
# fi;
 old_offset=$new_offset;
done

# everything looks good so far (due to exif
# header data, every other distance is 4096+4)





########## ADJUST THE BYTE OFFSETS

# Subtract 4(n-1) bytes from the nth offset
# Add 2 bytes to the close offsets to include FF D9
# Subtract 1 because grep's byte-offsets are 1-based
# whereas dd's skip option is 0-based

 rm -f SED_begin_3
 nn=-1; for i in `cat SED_begin_2`; do (( nn += 1)); echo $(( i - 4 *
nn - 1 )) >> SED_begin_3; done

 rm -f SED_close_3
 nn=-1; for i in `cat SED_close_2`; do (( nn += 1)); echo $(( 2 + i -
4 * nn - 1 )) >> SED_close_3; done



##########  GRAB EVERY OTHER LINE, BEGINNING WITH 1ST

# this is necessary due to the embedded jpg thumbnail

 cat SED_begin_3 | sed -n '1~2p' > SED_begin_4

 wc SED_begin_4

# 116  116 1107 SED_begin_2v

# looks like 116 jpgs will be recovered !!






############  CALCULATE THE EXTENT OF EACH JPG

# if this code runs smoothly, uncomment the CPU intensive
# dd command and run it again to dissect out the jpgs.

rm -f recovered*jpg;
old_close_offset=0;
n_begin=0;
for begin_offset in `cat SED_begin_4`; do
 (( n_begin += 1 ));
# Now find the second closing marker. The first
# closing marker belongs to the embedded thumbnail
 n_found=0;
 n_close=0;
 for close_offset in `cat SED_close_3`; do
  (( n_close += 1 ));
  if [ $close_offset -gt $begin_offset ]; then
   (( n_found += 1 ));
   if [ $n_found -eq 2 ]; then
    break;
   fi;
  fi;
 done;
 size_of_blk=$(( 1024 * 4 ));
 size_of_jpg=$(( $close_offset - $begin_offset ));
 size_of_gap=$(( $begin_offset - $old_close_offset ));
 begin_error=$(( begin_offset - ( size_of_blk * ( begin_offset /
size_of_blk ) ) ));
 skip=$(( begin_offset / size_of_blk ));
 # add 1 to round up
 chunk=$(( 1 + ( size_of_jpg / size_of_blk ) ));
 fn=`printf "recovered_%04d.jpg" $n_begin`;
 printf "%12s "           $fn;
 printf "%5d %5d "        $n_begin $n_close;
 printf "%10d %10d "      $size_of_gap $size_of_jpg;
 printf "%5d "            $begin_error;
 printf "%10d %10d "      $begin_offset $close_offset;
 printf "(%4d %6d %4d)\n" $size_of_blk $skip $chunk;
 old_close_offset=$close_offset;
#dd bs=${size_of_blk}c  skip=${skip}c count=${chunk}c if=CF_short |
head -c $size_of_jpg > $fn;
done


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to