Suppose you want to be able to execute the Visitor pattern as quickly as possible on some tree structure. You could compile your tree structure into executable code, each node a subroutine which invokes a method of the visitor object — traditionally each node type invokes a different method — and then passes the visitor object to each child node.
Using the "quaject" approach described in Henry Massalin's "Synthesis" thesis, and passing the pointer to the visitor object itself in %ebx, the code to invoke the appropriate method on the visitor might look like this: mov %ebx, %edx add $20, %edx call *%edx jc 1f ret 1: Here we are checking the carry flag to see if the visitor object is requesting that we skip child nodes; if so, we return immediately. The visitor "method" in this case is the code at offset 20 from the visitor base pointer; if its code is very short, it might be entirely inline at that point, but in many cases it will be longer, and only a call instruction will be present there. The called code will receive a pointer to the visitor itself in `%ebx`. However, we probably want to pass some additional information to the visitor method; for example, if we're a variable node in a program AST, we might want to pass the name of the variable, or if we're an element node in a DOM, we might want to pass the tag name and attributes. So the entire call might look like this: mov $0xfe38d080, %eax mov %ebx, %edx add $20, %edx call *%edx jc 1f ret 1: Following the call to the visitor method, we pass the visitor to the children. In the standard i386 GCC calling convention, `%ebx`, `%esi`, `%edi`, and `%ebp` are callee-saved registers, so if the visitor method follows this convention, we still have `%ebx` pointing to the visitor after it returns. So we can simply call our children one by one, implicitly passing them the visitor, then return: call 0xfe381000 call 0xfe38d844 call 0xfe391daa ret For many applications, though, the visitor needs to take some action at the end of each node as well as the beginning. Perhaps it could inspect the child count at the beginning, maintaining its own stack of remaining child counts, and invoke the appropriate code when a child count reaches zero; but it's probably simpler to just notify it explicitly, by invoking another method on it before returning. This suggests putting the node pointer in a callee-saved register too, such as %ebp. With this additional modification, our example node looks like this: 0: 55 push %ebp 1: bd 80 d0 38 fe mov $0xfe38d080,%ebp 6: 89 da mov %ebx,%edx 8: 83 c2 14 add $0x14,%edx b: ff d2 call *%edx d: 73 0f jae 0x1e f: e8 fc 0f 38 fe call 0xfe381010 14: e8 40 d8 38 fe call 0xfe38d859 19: e8 a6 1d 39 fe call 0xfe391dc4 1e: 89 da mov %ebx,%edx 20: 83 c2 1c add $0x1c,%edx 23: ff d2 call *%edx 25: 5d pop %ebp 26: c3 ret (Note that this version has acquired an unfortunate limit of 127 bytes of child nodes, which is to say, 25 of them.) That's 39 bytes for an executable representation of the data pointer `0xfe38d080` plus a variable-length list of child pointers that happens to contain three pointers; the most straightforward way to represent this struct node { enum { document_node, element_node, text_node } nodetype; struct nodedata *data; int n_children; struct node *child[0]; }; would have needed 24 bytes, so making the data structure executable has cost us less than a factor of 2 in space. (In the case where the children are sufficiently small, we can inline them, eliminating another 6 bytes (16%) of call and return.) We can guarantee the visitor code a stronger calling convention than the usual; an executable tree built in the above fashion behaves in a very stereotyped way: of the caller-saved registers, it uses only `%edx`, plus `%eax` as an argument, so the visitor code is free to use `%ecx` as a private global variable, unless it calls some other code, in which case it must save it as usual. We can free up `%edx` too for the visitor's use as follows: 0: 55 push %ebp 1: 53 push %ebx 2: bd 80 d0 38 fe mov $0xfe38d080,%ebp 7: 83 c3 14 add $0x14,%ebx a: ff d3 call *%ebx c: 73 0f jae 0x1d e: e8 fc 0f 38 fe call 0xfe38100f 13: e8 40 d8 38 fe call 0xfe38d858 18: e8 a6 1d 39 fe call 0xfe391dc3 1d: 83 c3 08 add $0x8,%ebx 20: ff d3 call *%ebx 22: 5b pop %ebx 23: 5d pop %ebp 24: c3 ret This has the side advantage of saving us two bytes (5%), but it also means the visitor method is invoked with a pointer to the visitor method rather than the visitor object; it is likely to need to subtract the method offset and add it back later. This seems like a fair tradeoff for giving it two private registers instead of one. Additionally, the return value of the visitor method (that is, whatever it leaves in `%eax`) is either passed to the next visitor method called or is the return value of the entire node traversal. For whatever it matters in today's world, this also means that the per-node overhead for tree traversal is 14 instructions, which seems pretty small. It may blow up your icache, though, and its single conditional jump might be mispredicted more often than the several in a more traditional implementation. Consider this C implementation, listing generated as follows: gcc -g -c -Wall -O4 -fomit-frame-pointer -Wa,-adhlns=nonquajectdom.lst nonquajectdom.c 1 .file "nonquajectdom.c" 2 .text 3 .Ltext0: 4 .p2align 4,,15 6 _visit_node: 7 .LFB1: 8 .file 1 "nonquajectdom.c" 1:nonquajectdom.c **** struct node { 2:nonquajectdom.c **** enum { document_node, element_node, text_node } nodetype; 3:nonquajectdom.c **** struct nodedata *data; 4:nonquajectdom.c **** int n_children; 5:nonquajectdom.c **** struct node *child[0]; 6:nonquajectdom.c **** }; 7:nonquajectdom.c **** 8:nonquajectdom.c **** typedef int bool; 9:nonquajectdom.c **** 10:nonquajectdom.c **** struct visitor { 11:nonquajectdom.c **** void *visitor_data; 12:nonquajectdom.c **** bool (*document_callback)(struct visitor *self, struct nodedata*); 13:nonquajectdom.c **** void (*document_end_callback)(struct visitor *self, struct nodedata*); 14:nonquajectdom.c **** bool (*element_callback)(struct visitor *self, struct nodedata*); 15:nonquajectdom.c **** void (*element_end_callback)(struct visitor *self, struct nodedata*); 16:nonquajectdom.c **** void (*text_callback)(struct visitor *self, struct nodedata*); 17:nonquajectdom.c **** }; 18:nonquajectdom.c **** 19:nonquajectdom.c **** void visit_node(struct node *n, struct visitor *v); 20:nonquajectdom.c **** 21:nonquajectdom.c **** static inline void visit_children(struct node *n, struct visitor *v) 22:nonquajectdom.c **** { 23:nonquajectdom.c **** int ii; 24:nonquajectdom.c **** 25:nonquajectdom.c **** for (ii = 0; ii < n->n_children; ii++) { 26:nonquajectdom.c **** visit_node(n->child[ii], v); 27:nonquajectdom.c **** } 28:nonquajectdom.c **** } 29:nonquajectdom.c **** 30:nonquajectdom.c **** static void _visit_node(struct node *n, struct visitor *v) 31:nonquajectdom.c **** { 9 .loc 1 31 0 10 .cfi_startproc 11 .LVL0: 12 0000 83EC1C subl $28, %esp 13 .LCFI0: 14 .cfi_def_cfa_offset 32 15 0003 895C2410 movl %ebx, 16(%esp) 16 0007 89C3 movl %eax, %ebx 17 .cfi_offset 3, -16 32:nonquajectdom.c **** switch (n->nodetype) { 18 .loc 1 32 0 19 0009 8B00 movl (%eax), %eax 20 .LVL1: 31:nonquajectdom.c **** { 21 .loc 1 31 0 22 000b 89742414 movl %esi, 20(%esp) 23 000f 89D6 movl %edx, %esi 24 .cfi_offset 6, -12 25 0011 897C2418 movl %edi, 24(%esp) 26 .loc 1 32 0 27 0015 83F801 cmpl $1, %eax 28 0018 7476 je .L4 29 .cfi_offset 7, -8 30 001a 7224 jb .L3 31 001c 83F802 cmpl $2, %eax 32 001f 750D jne .L1 33:nonquajectdom.c **** case element_node: 34:nonquajectdom.c **** if (v->element_callback(v, n->data)) visit_children(n, v); 35:nonquajectdom.c **** v->element_end_callback(v, n->data); 36:nonquajectdom.c **** break; 37:nonquajectdom.c **** case document_node: 38:nonquajectdom.c **** if (v->document_callback(v, n->data)) visit_children(n, v); 39:nonquajectdom.c **** v->document_end_callback(v, n->data); 40:nonquajectdom.c **** break; 41:nonquajectdom.c **** case text_node: 42:nonquajectdom.c **** v->text_callback(v, n->data); 33 .loc 1 42 0 34 0021 8B4304 movl 4(%ebx), %eax 35 0024 891424 movl %edx, (%esp) 36 0027 89442404 movl %eax, 4(%esp) 37 002b FF5214 call *20(%edx) 38 .LVL2: 39 .L1: 43:nonquajectdom.c **** break; 44:nonquajectdom.c **** } 45:nonquajectdom.c **** } 40 .loc 1 45 0 41 002e 8B5C2410 movl 16(%esp), %ebx 42 .LVL3: 43 0032 8B742414 movl 20(%esp), %esi 44 .LVL4: 45 0036 8B7C2418 movl 24(%esp), %edi 46 003a 83C41C addl $28, %esp 47 .cfi_remember_state 48 .LCFI1: 49 .cfi_def_cfa_offset 4 50 .cfi_restore 7 51 .cfi_restore 6 52 .cfi_restore 3 53 003d C3 ret 54 .LVL5: 55 003e 6690 .p2align 4,,7 56 .p2align 3 57 .L3: 58 .LCFI2: 59 .cfi_restore_state 38:nonquajectdom.c **** if (v->document_callback(v, n->data)) visit_children(n, v); 60 .loc 1 38 0 61 0040 8B4304 movl 4(%ebx), %eax 62 0043 891424 movl %edx, (%esp) 63 0046 89442404 movl %eax, 4(%esp) 64 004a FF5204 call *4(%edx) 65 .LVL6: 66 004d 85C0 testl %eax, %eax 67 004f 7422 je .L8 68 .LVL7: 69 .LBB12: 70 .LBB13: 25:nonquajectdom.c **** for (ii = 0; ii < n->n_children; ii++) { 71 .loc 1 25 0 72 0051 8B4308 movl 8(%ebx), %eax 73 0054 85C0 testl %eax, %eax 74 0056 7E1B jle .L8 75 0058 31FF xorl %edi, %edi 76 .LVL8: 77 005a 8DB60000 .p2align 4,,7 77 0000 78 .p2align 3 79 .L9: 80 .LBB14: 81 .LBB15: 46:nonquajectdom.c **** 47:nonquajectdom.c **** void visit_node(struct node *n, struct visitor *v) 48:nonquajectdom.c **** { 49:nonquajectdom.c **** _visit_node(n, v); 82 .loc 1 49 0 83 0060 8B44BB0C movl 12(%ebx,%edi,4), %eax 84 0064 89F2 movl %esi, %edx 85 .LBE15: 86 .LBE14: 25:nonquajectdom.c **** for (ii = 0; ii < n->n_children; ii++) { 87 .loc 1 25 0 88 0066 83C701 addl $1, %edi 89 .LVL9: 90 .LBB17: 91 .LBB16: 92 .loc 1 49 0 93 0069 E892FFFF call _visit_node 93 FF 94 .LVL10: 95 .LBE16: 96 .LBE17: 25:nonquajectdom.c **** for (ii = 0; ii < n->n_children; ii++) { 97 .loc 1 25 0 98 006e 3B7B08 cmpl 8(%ebx), %edi 99 0071 7CED jl .L9 100 .LVL11: 101 .L8: 102 .LBE13: 103 .LBE12: 39:nonquajectdom.c **** v->document_end_callback(v, n->data); 104 .loc 1 39 0 105 0073 8B4304 movl 4(%ebx), %eax 106 0076 893424 movl %esi, (%esp) 107 0079 89442404 movl %eax, 4(%esp) 108 007d FF5608 call *8(%esi) 45:nonquajectdom.c **** } 109 .loc 1 45 0 110 0080 8B5C2410 movl 16(%esp), %ebx 111 .LVL12: 112 0084 8B742414 movl 20(%esp), %esi 113 .LVL13: 114 0088 8B7C2418 movl 24(%esp), %edi 115 008c 83C41C addl $28, %esp 116 .cfi_remember_state 117 .cfi_restore 3 118 .cfi_restore 6 119 .cfi_restore 7 120 .LCFI3: 121 .cfi_def_cfa_offset 4 122 008f C3 ret 123 .LVL14: 124 .p2align 4,,7 125 .p2align 3 126 .L4: 127 .LCFI4: 128 .cfi_restore_state 34:nonquajectdom.c **** if (v->element_callback(v, n->data)) visit_children(n, v); 129 .loc 1 34 0 130 0090 8B4304 movl 4(%ebx), %eax 131 0093 891424 movl %edx, (%esp) 132 0096 89442404 movl %eax, 4(%esp) 133 009a FF520C call *12(%edx) 134 .LVL15: 135 009d 85C0 testl %eax, %eax 136 009f 7422 je .L6 137 .LVL16: 138 .LBB18: 139 .LBB19: 25:nonquajectdom.c **** for (ii = 0; ii < n->n_children; ii++) { 140 .loc 1 25 0 141 00a1 8B5308 movl 8(%ebx), %edx 142 00a4 85D2 testl %edx, %edx 143 00a6 7E1B jle .L6 144 00a8 31FF xorl %edi, %edi 145 .LVL17: 146 00aa 8DB60000 .p2align 4,,7 146 0000 147 .p2align 3 148 .L7: 149 .LBB20: 150 .LBB21: 151 .loc 1 49 0 152 00b0 8B44BB0C movl 12(%ebx,%edi,4), %eax 153 00b4 89F2 movl %esi, %edx 154 .LBE21: 155 .LBE20: 25:nonquajectdom.c **** for (ii = 0; ii < n->n_children; ii++) { 156 .loc 1 25 0 157 00b6 83C701 addl $1, %edi 158 .LVL18: 159 .LBB23: 160 .LBB22: 161 .loc 1 49 0 162 00b9 E842FFFF call _visit_node 162 FF 163 .LVL19: 164 .LBE22: 165 .LBE23: 25:nonquajectdom.c **** for (ii = 0; ii < n->n_children; ii++) { 166 .loc 1 25 0 167 00be 3B7B08 cmpl 8(%ebx), %edi 168 00c1 7CED jl .L7 169 .LVL20: 170 .L6: 171 .LBE19: 172 .LBE18: 35:nonquajectdom.c **** v->element_end_callback(v, n->data); 173 .loc 1 35 0 174 00c3 8B4304 movl 4(%ebx), %eax 175 00c6 893424 movl %esi, (%esp) 176 00c9 89442404 movl %eax, 4(%esp) 177 00cd FF5610 call *16(%esi) 45:nonquajectdom.c **** } 178 .loc 1 45 0 179 00d0 8B5C2410 movl 16(%esp), %ebx 180 .LVL21: 181 00d4 8B742414 movl 20(%esp), %esi 182 .LVL22: 183 00d8 8B7C2418 movl 24(%esp), %edi 184 00dc 83C41C addl $28, %esp 185 .cfi_restore 3 186 .cfi_restore 6 187 .cfi_restore 7 188 .LCFI5: 189 .cfi_def_cfa_offset 4 190 00df C3 ret 191 .cfi_endproc 192 .LFE1: 194 .p2align 4,,15 195 .globl visit_node 197 visit_node: 198 .LFB2: 48:nonquajectdom.c **** { 199 .loc 1 48 0 200 .cfi_startproc 201 .LVL23: 202 .loc 1 49 0 203 00e0 8B542408 movl 8(%esp), %edx 204 00e4 8B442404 movl 4(%esp), %eax 205 00e8 E913FFFF jmp _visit_node 205 FF 206 .cfi_endproc 207 .LFE2: 209 .Letext0: DEFINED SYMBOLS *ABS*:0000000000000000 nonquajectdom.c /tmp/cc4F3UeL.s:6 .text:0000000000000000 _visit_node /tmp/cc4F3UeL.s:197 .text:00000000000000e0 visit_node NO UNDEFINED SYMBOLS The above is really hard for me to read, so here's the disassembly from `objdump -d`: /home/default/devel/aspmisc/nonquajectdom.o: file format elf32-i386 Disassembly of section .text: 00000000 <_visit_node>: 0: 83 ec 1c sub $0x1c,%esp 3: 89 5c 24 10 mov %ebx,0x10(%esp) 7: 89 c3 mov %eax,%ebx 9: 8b 00 mov (%eax),%eax b: 89 74 24 14 mov %esi,0x14(%esp) f: 89 d6 mov %edx,%esi 11: 89 7c 24 18 mov %edi,0x18(%esp) 15: 83 f8 01 cmp $0x1,%eax 18: 74 76 je 90 <_visit_node+0x90> 1a: 72 24 jb 40 <_visit_node+0x40> 1c: 83 f8 02 cmp $0x2,%eax 1f: 75 0d jne 2e <_visit_node+0x2e> 21: 8b 43 04 mov 0x4(%ebx),%eax 24: 89 14 24 mov %edx,(%esp) 27: 89 44 24 04 mov %eax,0x4(%esp) 2b: ff 52 14 call *0x14(%edx) 2e: 8b 5c 24 10 mov 0x10(%esp),%ebx 32: 8b 74 24 14 mov 0x14(%esp),%esi 36: 8b 7c 24 18 mov 0x18(%esp),%edi 3a: 83 c4 1c add $0x1c,%esp 3d: c3 ret 3e: 66 90 xchg %ax,%ax 40: 8b 43 04 mov 0x4(%ebx),%eax 43: 89 14 24 mov %edx,(%esp) 46: 89 44 24 04 mov %eax,0x4(%esp) 4a: ff 52 04 call *0x4(%edx) 4d: 85 c0 test %eax,%eax 4f: 74 22 je 73 <_visit_node+0x73> 51: 8b 43 08 mov 0x8(%ebx),%eax 54: 85 c0 test %eax,%eax 56: 7e 1b jle 73 <_visit_node+0x73> 58: 31 ff xor %edi,%edi 5a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 60: 8b 44 bb 0c mov 0xc(%ebx,%edi,4),%eax 64: 89 f2 mov %esi,%edx 66: 83 c7 01 add $0x1,%edi 69: e8 92 ff ff ff call 0 <_visit_node> 6e: 3b 7b 08 cmp 0x8(%ebx),%edi 71: 7c ed jl 60 <_visit_node+0x60> 73: 8b 43 04 mov 0x4(%ebx),%eax 76: 89 34 24 mov %esi,(%esp) 79: 89 44 24 04 mov %eax,0x4(%esp) 7d: ff 56 08 call *0x8(%esi) 80: 8b 5c 24 10 mov 0x10(%esp),%ebx 84: 8b 74 24 14 mov 0x14(%esp),%esi 88: 8b 7c 24 18 mov 0x18(%esp),%edi 8c: 83 c4 1c add $0x1c,%esp 8f: c3 ret 90: 8b 43 04 mov 0x4(%ebx),%eax 93: 89 14 24 mov %edx,(%esp) 96: 89 44 24 04 mov %eax,0x4(%esp) 9a: ff 52 0c call *0xc(%edx) 9d: 85 c0 test %eax,%eax 9f: 74 22 je c3 <_visit_node+0xc3> a1: 8b 53 08 mov 0x8(%ebx),%edx a4: 85 d2 test %edx,%edx a6: 7e 1b jle c3 <_visit_node+0xc3> a8: 31 ff xor %edi,%edi aa: 8d b6 00 00 00 00 lea 0x0(%esi),%esi b0: 8b 44 bb 0c mov 0xc(%ebx,%edi,4),%eax b4: 89 f2 mov %esi,%edx b6: 83 c7 01 add $0x1,%edi b9: e8 42 ff ff ff call 0 <_visit_node> be: 3b 7b 08 cmp 0x8(%ebx),%edi c1: 7c ed jl b0 <_visit_node+0xb0> c3: 8b 43 04 mov 0x4(%ebx),%eax c6: 89 34 24 mov %esi,(%esp) c9: 89 44 24 04 mov %eax,0x4(%esp) cd: ff 56 10 call *0x10(%esi) d0: 8b 5c 24 10 mov 0x10(%esp),%ebx d4: 8b 74 24 14 mov 0x14(%esp),%esi d8: 8b 7c 24 18 mov 0x18(%esp),%edi dc: 83 c4 1c add $0x1c,%esp df: c3 ret 000000e0 <visit_node>: e0: 8b 54 24 08 mov 0x8(%esp),%edx e4: 8b 44 24 04 mov 0x4(%esp),%eax e8: e9 13 ff ff ff jmp 0 <_visit_node> Let's slice that down to just the `element_node` case, which the listing above shows us begins at 0x90. 00000000 <_visit_node>: 0: 83 ec 1c sub $0x1c,%esp 3: 89 5c 24 10 mov %ebx,0x10(%esp) 7: 89 c3 mov %eax,%ebx 9: 8b 00 mov (%eax),%eax b: 89 74 24 14 mov %esi,0x14(%esp) f: 89 d6 mov %edx,%esi 11: 89 7c 24 18 mov %edi,0x18(%esp) 15: 83 f8 01 cmp $0x1,%eax 18: 74 76 je 90 <_visit_node+0x90> ... 90: 8b 43 04 mov 0x4(%ebx),%eax 93: 89 14 24 mov %edx,(%esp) 96: 89 44 24 04 mov %eax,0x4(%esp) 9a: ff 52 0c call *0xc(%edx) 9d: 85 c0 test %eax,%eax 9f: 74 22 je c3 <_visit_node+0xc3> a1: 8b 53 08 mov 0x8(%ebx),%edx a4: 85 d2 test %edx,%edx a6: 7e 1b jle c3 <_visit_node+0xc3> a8: 31 ff xor %edi,%edi (confusing multibyte nop padding removed) b0: 8b 44 bb 0c mov 0xc(%ebx,%edi,4),%eax b4: 89 f2 mov %esi,%edx b6: 83 c7 01 add $0x1,%edi b9: e8 42 ff ff ff call 0 <_visit_node> be: 3b 7b 08 cmp 0x8(%ebx),%edi c1: 7c ed jl b0 <_visit_node+0xb0> c3: 8b 43 04 mov 0x4(%ebx),%eax c6: 89 34 24 mov %esi,(%esp) c9: 89 44 24 04 mov %eax,0x4(%esp) cd: ff 56 10 call *0x10(%esi) d0: 8b 5c 24 10 mov 0x10(%esp),%ebx d4: 8b 74 24 14 mov 0x14(%esp),%esi d8: 8b 7c 24 18 mov 0x18(%esp),%edi dc: 83 c4 1c add $0x1c,%esp df: c3 ret 000000e0 <visit_node>: e0: 8b 54 24 08 mov 0x8(%esp),%edx e4: 8b 44 24 04 mov 0x4(%esp),%eax e8: e9 13 ff ff ff jmp 0 <_visit_node> I separated the recursive function from the entry point so that by making it `static`, GCC could pick a more reasonable calling convention for it; which worked, but only up to a point. Here our per-node overhead is 7 instructions of preamble, 2 instructions of dispatch (by chance, GCC happened to make dispatch fastest for this case), 3 instructions to call `element_callback` (using the three-byte `call` encoding optimized for function pointer tables), 2 instructions to conditionally skip the child nodes, 4 instructions of loop setup, 6 instructions per loop iteration (which we can count toward the child node's cost), 4 instructions to call `element_end_callback`, and 5 instructions of cleanup, for a total of (+ 7 2 3 2 4 6 4 5) = 33 instructions, of which four are conditional jumps, providing opportunities for branch misprediction. (However, the loop is simple enough that I imagine branch misprediction will be very rare.) The listing of the C code points out that if the visitor is not a quaject, just a regular struct with function pointers, then we can eliminate two `add` instructions, which probably don't matter, and 4 of the 37 bytes, which probably do, and also build visitors in plain C instead of assembly, at the cost of an extra indirection. That is: 0: 55 push %ebp 1: bd 80 d0 38 fe mov $0xfe38d080,%ebp 6: ff 53 04 call *0x4(%ebx) 9: 73 0f jae 0x1a b: e8 fc 0f 38 fe call 0xfe38100c 10: e8 40 d8 38 fe call 0xfe38d855 15: e8 a6 1d 39 fe call 0xfe391dc0 1a: ff 53 08 call *0x8(%ebx) 1d: 5d pop %ebp 1e: c3 ret Now our executable DOM node is 10 instructions and 31 bytes, only 7 bytes more than the 24 bytes the struct needs. Of those 7 bytes, 3 should properly be charged to the child nodes, so the overhead is more like 5 bytes and 8 instructions per node. <http://localhost:8000/wikipedia_en_all_nopic_01_2012/A/Dependency%20theory.html>, a snapshot of the Wikipedia page on "Dependency theory", has 512 elements, 681 text nodes, and 17 other nodes. Traversing the whole thing should cost some (* 8 (+ 512 681 17)) = 9680 instructions, or about ten microseconds. The document text is 48962 bytes. The tree structure in this form should cost (* 21 (+ 512 681 17)) = 25410 bytes, plus the cost of the metadata (element names, string sizes and lengths, attributes). The entire document tree, then, nearly fits in the 32KiB L1 cache of my netbook, (although I think that's split, so only 16KiB is instructions), and very easily indeed in its 512KiB L2 cache. The C code might be more efficient if its tree were binary. -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol