lseek to 0, then write 0 bytes is ftruncate. 

/cco

On May 20, 2012, at 10:46 PM, Kragen Javier Sitaker <kra...@canonical.org> 
wrote:

> Here's a little analysis of the disassembly.
> 
> On Mon, May 21, 2012 at 01:11:18AM -0400, Kragen Javier Sitaker wrote:
>> On Wed, May 09, 2012 at 06:39:25PM +0200, Dave Long wrote:
>>> Apropos the bootstrapping thread[0], here's another hex loader:
>>> 
>>>    0000000: 31c9 bf00 03ba 8a01 b40a cd21 a18b 013c  1..........!...<
>>>    0000010: 047c 17b8 0001 01c8 bb00 0202 1e8c 0189  .|..............
>>>    0000020: 07be 8c01 a5a5 9041 ebdb 31c0 a320 0289  .......A..1.. ..
>>>    0000030: cd31 c9be 0003 bf00 02bb 0100 31c9 31d2  .1..........1.1.
>>>    0000040: b800 42cd 21b8 0040 cd21 ac31 d231 c0ac  ..B.!..@.!.1.1..
>>>    0000050: 3c20 740f bb01 0101 cb29 da01 f889 c38b  < t......)......
>>>    0000060: 0701 c231 c0ac 0c20 d410 d503 2c09 c0e0  ...1... ....,...
>>>    0000070: 0401 c2ac 0c20 d410 d503 2c09 01c2 9090  ..... ....,.....
>>>    0000080: b402 cd21 4139 e975 c1c3 5000            ...!A9.u..P.
>> 
>> Here's the disassembly, for my benefit and for whoever else is reading this.
>> 
>>    kragen@VOSTRO9:~/devel$ objdump -m i8086 -b binary --adjust-vma=0x100 -D 
>> loader.com
>> 
>>    loader.com:     file format binary
>> 
>> 
>>    Disassembly of section .data:
>> 
>>    00000100 <.data>:
>>     100:   31 c9                   xor    %cx,%cx
>>     102:   bf 00 03                mov    $0x300,%di
>>     105:   ba 8a 01                mov    $0x18a,%dx
>>     108:   b4 0a                   mov    $0xa,%ah
>>     10a:   cd 21                   int    $0x21
> 
> int 21h function 0ah: buffered input from standard input, with buffer at %dx,
> which points just past the end of the program.  Not sure what's up with %cx 
> and
> %di here.
> 
> Note that 105 here is the jump target of the instruction at 128, so the
> initialization of %cx and %di is outside of an input loop.
> 
>>     10c:   a1 8b 01                mov    0x18b,%ax
> 
> "number of chars actually read"
> 
>>     10f:   3c 04                   cmp    $0x4,%al
>>     111:   7c 17                   jl     0x12a
> 
> If less than 4 chars read, exit the loop.
> 
>>     113:   b8 00 01                mov    $0x100,%ax
>>     116:   01 c8                   add    %cx,%ax
>>     118:   bb 00 02                mov    $0x200,%bx
>>     11b:   02 1e 8c 01             add    0x18c,%bl
>>     11f:   89 07                   mov    %ax,(%bx)
> 
> We're computing a two-byte value here in %ax to store in memory at %bx.  %bx 
> is
> going to be 0x200 plus whatever was stored at 0x18c, which was the first byte
> of input.  So we're indexing a table at 0x200 with the first byte of input.
> %ax is 0x100 plus %cx.  %cx started out as 0 before entering the loop and gets
> incremented each time through the loop, and I guess probably the system call
> doesn't clobber it, so it's the line number.  So this stores the current line
> number (or, equivalently, output offset) in a table entry indexed by the first
> byte of input.
> 
> It seems a little alarming that we're storing a two-byte line number/byte
> offset in a single-byte table entry.  I suppose that's not a problem as long 
> as
> your labels are always at least two letters apart... but isn't there an x86
> addressing mode that makes that problem easier?  So you could do `mov %ax,
> [0x200+2*bx]` or something, with just the input byte in bx?  Probably then
> you'd want to initialize %di to 0x400 in case somebody wants to use extended
> ASCII labels.
> 
>>     121:   be 8c 01                mov    $0x18c,%si
>>     124:   a5                      movsw  %ds:(%si),%es:(%di)
>>     125:   a5                      movsw  %ds:(%si),%es:(%di)
> 
> Now we append the first four bytes of input to the buffer at %di, which was
> initialized to 0x300.
> 
>>     126:   90                      nop    
>>     127:   41                      inc    %cx
>>     128:   eb db                   jmp    0x105
> 
> Okay, so that's the end of the input loop.  From here we have straight-line
> code until the output loop.
> 
>>     12a:   31 c0                   xor    %ax,%ax
>>     12c:   a3 20 02                mov    %ax,0x220
> 
> Wiping out the definition of the "space" label.
> 
>>     12f:   89 cd                   mov    %cx,%bp
> 
> Okay, so the total output size goes into %bp.
> 
>>     131:   31 c9                   xor    %cx,%cx
>>     133:   be 00 03                mov    $0x300,%si
> 
> We're gonna be copying from the stored input program text?
> 
>>     136:   bf 00 02                mov    $0x200,%di
> 
> ...into the symbol table?
> 
>>     139:   bb 01 00                mov    $0x1,%bx
>>     13c:   31 c9                   xor    %cx,%cx
> 
> That seems a little redundant.  %cx is already pretty zeroed.
> 
>>     13e:   31 d2                   xor    %dx,%dx
>>     140:   b8 00 42                mov    $0x4200,%ax
>>     143:   cd 21                   int    $0x21
> 
> 42h is lseek: set current file position.  00h in %al is from the start of the
> file.  1 in %bx is fd 1, stdout.  %cx:%dx = 0 is the offset from the start of
> the file.  Not yet sure why this lseek is useful; isn't that where you 
> normally
> start writing the output if it's been redirected?
> 
>>     145:   b8 00 40                mov    $0x4000,%ax
>>     148:   cd 21                   int    $0x21
> 
> 0x40 is write(), which is somewhat unexpected, since we haven't done any
> decoding yet.  %cx is the number of bytes to write, which is presumably still
> 0.  So this is sort of a mystery, maybe leftover code?  Or maybe I screwed up
> the disassembly?  It's the end of the straight-line code; the output loop
> starts here, which I still haven't really begun to analyze; perhaps tomorrow:
> 
>>     14a:   ac                      lods   %ds:(%si),%al
>>     14b:   31 d2                   xor    %dx,%dx
>>     14d:   31 c0                   xor    %ax,%ax
>>     14f:   ac                      lods   %ds:(%si),%al
>>     150:   3c 20                   cmp    $0x20,%al
>>     152:   74 0f                   je     0x163
>>     154:   bb 01 01                mov    $0x101,%bx
>>     157:   01 cb                   add    %cx,%bx
>>     159:   29 da                   sub    %bx,%dx
>>     15b:   01 f8                   add    %di,%ax
>>     15d:   89 c3                   mov    %ax,%bx
>>     15f:   8b 07                   mov    (%bx),%ax
>>     161:   01 c2                   add    %ax,%dx
>>     163:   31 c0                   xor    %ax,%ax
>>     165:   ac                      lods   %ds:(%si),%al
>>     166:   0c 20                   or     $0x20,%al
>>     168:   d4 10                   aam    $0x10
>>     16a:   d5 03                   aad    $0x3
>>     16c:   2c 09                   sub    $0x9,%al
>>     16e:   c0 e0 04                shl    $0x4,%al
>>     171:   01 c2                   add    %ax,%dx
>>     173:   ac                      lods   %ds:(%si),%al
>>     174:   0c 20                   or     $0x20,%al
>>     176:   d4 10                   aam    $0x10
>>     178:   d5 03                   aad    $0x3
>>     17a:   2c 09                   sub    $0x9,%al
>>     17c:   01 c2                   add    %ax,%dx
>>     17e:   90                      nop    
>>     17f:   90                      nop    
>>     180:   b4 02                   mov    $0x2,%ah
>>     182:   cd 21                   int    $0x21
> 
> 2h is "write character (in %dl) to stdout".
> 
>>     184:   41                      inc    %cx
>>     185:   39 e9                   cmp    %bp,%cx
>>     187:   75 c1                   jne    0x14a
>>     189:   c3                      ret    
>>     18a:   50                      push   %ax
>>        ...
> 
> Kragen
> -- 
> To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss

Reply via email to