lseek to 0, then write 0 bytes is ftruncate. /cco
On May 20, 2012, at 10:46 PM, Kragen Javier Sitaker <kra...@canonical.org> wrote: > Here's a little analysis of the disassembly. > > On Mon, May 21, 2012 at 01:11:18AM -0400, Kragen Javier Sitaker wrote: >> On Wed, May 09, 2012 at 06:39:25PM +0200, Dave Long wrote: >>> Apropos the bootstrapping thread[0], here's another hex loader: >>> >>> 0000000: 31c9 bf00 03ba 8a01 b40a cd21 a18b 013c 1..........!...< >>> 0000010: 047c 17b8 0001 01c8 bb00 0202 1e8c 0189 .|.............. >>> 0000020: 07be 8c01 a5a5 9041 ebdb 31c0 a320 0289 .......A..1.. .. >>> 0000030: cd31 c9be 0003 bf00 02bb 0100 31c9 31d2 .1..........1.1. >>> 0000040: b800 42cd 21b8 0040 cd21 ac31 d231 c0ac ..B.!..@.!.1.1.. >>> 0000050: 3c20 740f bb01 0101 cb29 da01 f889 c38b < t......)...... >>> 0000060: 0701 c231 c0ac 0c20 d410 d503 2c09 c0e0 ...1... ....,... >>> 0000070: 0401 c2ac 0c20 d410 d503 2c09 01c2 9090 ..... ....,..... >>> 0000080: b402 cd21 4139 e975 c1c3 5000 ...!A9.u..P. >> >> Here's the disassembly, for my benefit and for whoever else is reading this. >> >> kragen@VOSTRO9:~/devel$ objdump -m i8086 -b binary --adjust-vma=0x100 -D >> loader.com >> >> loader.com: file format binary >> >> >> Disassembly of section .data: >> >> 00000100 <.data>: >> 100: 31 c9 xor %cx,%cx >> 102: bf 00 03 mov $0x300,%di >> 105: ba 8a 01 mov $0x18a,%dx >> 108: b4 0a mov $0xa,%ah >> 10a: cd 21 int $0x21 > > int 21h function 0ah: buffered input from standard input, with buffer at %dx, > which points just past the end of the program. Not sure what's up with %cx > and > %di here. > > Note that 105 here is the jump target of the instruction at 128, so the > initialization of %cx and %di is outside of an input loop. > >> 10c: a1 8b 01 mov 0x18b,%ax > > "number of chars actually read" > >> 10f: 3c 04 cmp $0x4,%al >> 111: 7c 17 jl 0x12a > > If less than 4 chars read, exit the loop. > >> 113: b8 00 01 mov $0x100,%ax >> 116: 01 c8 add %cx,%ax >> 118: bb 00 02 mov $0x200,%bx >> 11b: 02 1e 8c 01 add 0x18c,%bl >> 11f: 89 07 mov %ax,(%bx) > > We're computing a two-byte value here in %ax to store in memory at %bx. %bx > is > going to be 0x200 plus whatever was stored at 0x18c, which was the first byte > of input. So we're indexing a table at 0x200 with the first byte of input. > %ax is 0x100 plus %cx. %cx started out as 0 before entering the loop and gets > incremented each time through the loop, and I guess probably the system call > doesn't clobber it, so it's the line number. So this stores the current line > number (or, equivalently, output offset) in a table entry indexed by the first > byte of input. > > It seems a little alarming that we're storing a two-byte line number/byte > offset in a single-byte table entry. I suppose that's not a problem as long > as > your labels are always at least two letters apart... but isn't there an x86 > addressing mode that makes that problem easier? So you could do `mov %ax, > [0x200+2*bx]` or something, with just the input byte in bx? Probably then > you'd want to initialize %di to 0x400 in case somebody wants to use extended > ASCII labels. > >> 121: be 8c 01 mov $0x18c,%si >> 124: a5 movsw %ds:(%si),%es:(%di) >> 125: a5 movsw %ds:(%si),%es:(%di) > > Now we append the first four bytes of input to the buffer at %di, which was > initialized to 0x300. > >> 126: 90 nop >> 127: 41 inc %cx >> 128: eb db jmp 0x105 > > Okay, so that's the end of the input loop. From here we have straight-line > code until the output loop. > >> 12a: 31 c0 xor %ax,%ax >> 12c: a3 20 02 mov %ax,0x220 > > Wiping out the definition of the "space" label. > >> 12f: 89 cd mov %cx,%bp > > Okay, so the total output size goes into %bp. > >> 131: 31 c9 xor %cx,%cx >> 133: be 00 03 mov $0x300,%si > > We're gonna be copying from the stored input program text? > >> 136: bf 00 02 mov $0x200,%di > > ...into the symbol table? > >> 139: bb 01 00 mov $0x1,%bx >> 13c: 31 c9 xor %cx,%cx > > That seems a little redundant. %cx is already pretty zeroed. > >> 13e: 31 d2 xor %dx,%dx >> 140: b8 00 42 mov $0x4200,%ax >> 143: cd 21 int $0x21 > > 42h is lseek: set current file position. 00h in %al is from the start of the > file. 1 in %bx is fd 1, stdout. %cx:%dx = 0 is the offset from the start of > the file. Not yet sure why this lseek is useful; isn't that where you > normally > start writing the output if it's been redirected? > >> 145: b8 00 40 mov $0x4000,%ax >> 148: cd 21 int $0x21 > > 0x40 is write(), which is somewhat unexpected, since we haven't done any > decoding yet. %cx is the number of bytes to write, which is presumably still > 0. So this is sort of a mystery, maybe leftover code? Or maybe I screwed up > the disassembly? It's the end of the straight-line code; the output loop > starts here, which I still haven't really begun to analyze; perhaps tomorrow: > >> 14a: ac lods %ds:(%si),%al >> 14b: 31 d2 xor %dx,%dx >> 14d: 31 c0 xor %ax,%ax >> 14f: ac lods %ds:(%si),%al >> 150: 3c 20 cmp $0x20,%al >> 152: 74 0f je 0x163 >> 154: bb 01 01 mov $0x101,%bx >> 157: 01 cb add %cx,%bx >> 159: 29 da sub %bx,%dx >> 15b: 01 f8 add %di,%ax >> 15d: 89 c3 mov %ax,%bx >> 15f: 8b 07 mov (%bx),%ax >> 161: 01 c2 add %ax,%dx >> 163: 31 c0 xor %ax,%ax >> 165: ac lods %ds:(%si),%al >> 166: 0c 20 or $0x20,%al >> 168: d4 10 aam $0x10 >> 16a: d5 03 aad $0x3 >> 16c: 2c 09 sub $0x9,%al >> 16e: c0 e0 04 shl $0x4,%al >> 171: 01 c2 add %ax,%dx >> 173: ac lods %ds:(%si),%al >> 174: 0c 20 or $0x20,%al >> 176: d4 10 aam $0x10 >> 178: d5 03 aad $0x3 >> 17a: 2c 09 sub $0x9,%al >> 17c: 01 c2 add %ax,%dx >> 17e: 90 nop >> 17f: 90 nop >> 180: b4 02 mov $0x2,%ah >> 182: cd 21 int $0x21 > > 2h is "write character (in %dl) to stdout". > >> 184: 41 inc %cx >> 185: 39 e9 cmp %bp,%cx >> 187: 75 c1 jne 0x14a >> 189: c3 ret >> 18a: 50 push %ax >> ... > > Kragen > -- > To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss