I've done some more testing of find_applet_by_name. The current code has
two tweakable parameters. KNOWN_APPNAME_OFFSETS is the number of known
offsets stored in the applet_nameofs array (plus one, because we don't
store the zero offset). This defaults to 8, which is too large if only a
few applets configured. This test in the code:
if (KNOWN_APPNAME_OFFSETS > 0 && NUM_APPLETS > 2*KNOWN_APPNAME_OFFSETS) {
reverts to a simple linear search with no stored offsets if there are
fewer than 16 applets. The hardcoded multiplier 2 is the second thing
that can be tweaked.
I ran tests for various values of NUM_APPLETS, KNOWN_APPNAME_OFFSETS
and the multiplier. The results are in the attached file test1.
With the default settings this gives:
applets cycles bloat
13 224.07 0
28 249.35 106
48 289.86 106
72 356.59 106
96 415.25 106
133 520.57 109
172 612.75 109
214 764.11 109
281 941.46 109
364 1162.98 107
494 1468.60 107
'cycles' is the average number of cycles for a call to find_applet_by_name
as reported by 'valgrind --tool=callgrind'. The original code using a
binary search averaged 353 cycles per call. 'bloat' is the bloat in
bytes relative to KNOWN_APPNAME_OFFSETS = 0 (so it doesn't include the
savings that result from not storing offsets for all names).
No, BusyBox doesn't have 494 applets (yet): I invented some fake ones.
With only a few applets configured the bloat will exceed the savings
arising from not storing offsets for all names (which is about 1.75 *
NUM_APPLETS). This could be reduced by increasing the multiplier.
Reducing the number of stored offsets would also help.
With a large number of applets it might be valuable to increase the
number of stored offsets to reduce the execution time.
After staring at the numbers and the code for a while I came up with
an alternative approach. The execution time of find_applet_by_name
depends, in large part, on the number of names between successive
known offsets. Call that the block size. It's currently
NUM_APPLETS/KNOWN_APPNAME_OFFSETS.
Suppose we make the block size configurable (APPNAME_BLOCK_SIZE) and
derive the number of offsets from that, NUM_APPLETS/APPNAME_BLOCK_SIZE.
Results of tests using this approach are in the attached file test2.
Some highlights:
block size 25 30 35 40
applets cycles bloat cycles bloat cycles bloat cycles bloat
13 224.07 0 224.07 0 224.07 0 224.07 0
28 462.83 0 462.83 0 462.83 0 462.83 0
48 840.88 0 840.88 0 840.88 0 840.88 0
72 703.86 57 703.86 57 703.86 57 1212.37 0
96 678.07 90 678.07 90 941.25 57 941.25 57
133 665.84 114 740.55 93 927.43 91 927.43 91
172 745.96 118 829.50 114 951.37 93 951.37 93
214 756.17 109 818.73 122 900.33 118 1028.08 114
281 798.10 138 863.99 130 949.66 109 1062.82 122
364 861.78 148 910.64 140 1016.15 132 1060.04 128
494 937.61 168 970.37 139 1052.08 148 1125.15 140
In general, compared with the fixed number of offsets, this trades
peformance for bytes at low numbers of applets and bytes for performance
at high numbers. A block size in the low 30s looks like a reasonable
compromise.
Ron
Test results for find_applet_by_name
The columns are:
1 number of applets (NUM_APPLETS)
2 number of known offsets (KNOWN_APPNAME_OFFSETS)
3 multiplier in test NUM_APPLETS > MULTIPLIER*KNOWN_APP_NAME_OFFSETS
4 number of names that need to be tested (NUM_APPLETS/KNOWN_APPNAME_OFFSETS)
5 average number of cycles per call
6 bloat in bytes relative to KNOWN_APPNAME_OFFSETS = 0
13 0 2 13.0 224.07 0
13 0 4 13.0 224.07 0
13 0 8 13.0 224.07 0
13 2 2 6.5 171.94 54
13 2 4 6.5 171.94 54
13 2 8 13.0 224.07 0
13 4 2 3.2 166.08 90
13 4 4 13.0 224.07 0
13 4 8 13.0 224.07 0
13 8 2 13.0 224.07 0
13 8 4 13.0 224.07 0
13 8 8 13.0 224.07 0
13 16 2 13.0 224.07 0
13 16 4 13.0 224.07 0
13 16 8 13.0 224.07 0
13 32 2 13.0 224.07 0
13 32 4 13.0 224.07 0
13 32 8 13.0 224.07 0
28 0 2 28.0 478.79 0
28 0 4 28.0 478.79 0
28 0 8 28.0 478.79 0
28 2 2 14.0 299.62 54
28 2 4 14.0 299.62 54
28 2 8 14.0 299.62 54
28 4 2 7.0 234.66 90
28 4 4 7.0 234.66 90
28 4 8 28.0 478.79 0
28 8 2 3.5 249.35 106
28 8 4 28.0 478.79 0
28 8 8 28.0 478.79 0
28 16 2 28.0 478.79 0
28 16 4 28.0 478.79 0
28 16 8 28.0 478.79 0
28 32 2 28.0 478.79 0
28 32 4 28.0 478.79 0
28 32 8 28.0 478.79 0
48 0 2 48.0 804.88 0
48 0 4 48.0 804.88 0
48 0 8 48.0 804.88 0
48 2 2 24.0 474.84 54
48 2 4 24.0 474.84 54
48 2 8 24.0 474.84 54
48 4 2 12.0 318.94 90
48 4 4 12.0 318.94 90
48 4 8 12.0 318.94 90
48 8 2 6.0 289.86 106
48 8 4 6.0 289.86 106
48 8 8 48.0 804.88 0
48 16 2 3.0 367.29 138
48 16 4 48.0 804.88 0
48 16 8 48.0 804.88 0
48 32 2 48.0 804.88 0
48 32 4 48.0 804.88 0
48 32 8 48.0 804.88 0
72 0 2 72.0 1220.15 0
72 0 4 72.0 1220.15 0
72 0 8 72.0 1220.15 0
72 2 2 36.0 703.29 57
72 2 4 36.0 703.29 57
72 2 8 36.0 703.29 57
72 4 2 18.0 442.22 90
72 4 4 18.0 442.22 90
72 4 8 18.0 442.22 90
72 8 2 9.0 356.59 106
72 8 4 9.0 356.59 106
72 8 8 9.0 356.59 106
72 16 2 4.5 409.25 138
72 16 4 4.5 409.25 138
72 16 8 72.0 1220.15 0
72 32 2 2.2 643.81 202
72 32 4 72.0 1220.15 0
72 32 8 72.0 1220.15 0
96 0 2 96.0 1714.42 0
96 0 4 96.0 1714.42 0
96 0 8 96.0 1714.42 0
96 2 2 48.0 941.25 57
96 2 4 48.0 941.25 57
96 2 8 48.0 941.25 57
96 4 2 24.0 554.67 90
96 4 4 24.0 554.67 90
96 4 8 24.0 554.67 90
96 8 2 12.0 415.25 106
96 8 4 12.0 415.25 106
96 8 8 12.0 415.25 106
96 16 2 6.0 445.23 138
96 16 4 6.0 445.23 138
96 16 8 96.0 1714.42 0
96 32 2 3.0 664.51 202
96 32 4 96.0 1714.42 0
96 32 8 96.0 1714.42 0
133 0 2 133.0 2525.99 0
133 0 4 133.0 2525.99 0
133 0 8 133.0 2525.99 0
133 2 2 66.5 1395.69 57
133 2 4 66.5 1395.69 57
133 2 8 66.5 1395.69 57
133 4 2 33.2 766.14 93
133 4 4 33.2 766.14 93
133 4 8 33.2 766.14 93
133 8 2 16.6 520.57 109
133 8 4 16.6 520.57 109
133 8 8 16.6 520.57 109
133 16 2 8.3 497.34 141
133 16 4 8.3 497.34 141
133 16 8 8.3 497.34 141
133 32 2 4.2 677.36 205
133 32 4 4.2 677.36 205
133 32 8 133.0 2525.99 0
172 0 2 172.0 3231.88 0
172 0 4 172.0 3231.88 0
172 0 8 172.0 3231.88 0
172 2 2 86.0 1765.25 57
172 2 4 86.0 1765.25 57
172 2 8 86.0 1765.25 57
172 4 2 43.0 952.66 93
172 4 4 43.0 952.66 93
172 4 8 43.0 952.66 93
172 8 2 21.5 612.75 109
172 8 4 21.5 612.75 109
172 8 8 21.5 612.75 109
172 16 2 10.8 538.36 141
172 16 4 10.8 538.36 141
172 16 8 10.8 538.36 141
172 32 2 5.4 696.73 205
172 32 4 5.4 696.73 205
172 32 8 172.0 3231.88 0
214 0 2 214.0 4113.62 0
214 0 4 214.0 4113.62 0
214 0 8 214.0 4113.62 0
214 2 2 107.0 2228.89 57
214 2 4 107.0 2228.89 57
214 2 8 107.0 2228.89 57
214 4 2 53.5 1223.72 93
214 4 4 53.5 1223.72 93
214 4 8 53.5 1223.72 93
214 8 2 26.8 764.11 109
214 8 4 26.8 764.11 109
214 8 8 26.8 764.11 109
214 16 2 13.4 610.39 141
214 16 4 13.4 610.39 141
214 16 8 13.4 610.39 141
214 32 2 6.7 749.06 205
214 32 4 6.7 749.06 205
214 32 8 214.0 4113.62 0
281 0 2 281.0 5501.04 0
281 0 4 281.0 5501.04 0
281 0 8 281.0 5501.04 0
281 2 2 140.5 2942.55 60
281 2 4 140.5 2942.55 60
281 2 8 140.5 2942.55 60
281 4 2 70.2 1566.05 93
281 4 4 70.2 1566.05 93
281 4 8 70.2 1566.05 93
281 8 2 35.1 941.46 109
281 8 4 35.1 941.46 109
281 8 8 35.1 941.46 109
281 16 2 17.6 697.67 141
281 16 4 17.6 697.67 141
281 16 8 17.6 697.67 141
281 32 2 8.8 779.29 205
281 32 4 8.8 779.29 205
281 32 8 8.8 779.29 205
364 0 2 364.0 7202.46 0
364 0 4 364.0 7202.46 0
364 0 8 364.0 7202.46 0
364 2 2 182.0 3843.45 58
364 2 4 182.0 3843.45 58
364 2 8 182.0 3843.45 58
364 4 2 91.0 2005.85 91
364 4 4 91.0 2005.85 91
364 4 8 91.0 2005.85 91
364 8 2 45.5 1162.98 107
364 8 4 45.5 1162.98 107
364 8 8 45.5 1162.98 107
364 16 2 22.8 817.09 139
364 16 4 22.8 817.09 139
364 16 8 22.8 817.09 139
364 32 2 11.4 842.67 203
364 32 4 11.4 842.67 203
364 32 8 11.4 842.67 203
494 0 2 494.0 9702.74 0
494 0 4 494.0 9702.74 0
494 0 8 494.0 9702.74 0
494 2 2 247.0 5128.48 58
494 2 4 247.0 5128.48 58
494 2 8 247.0 5128.48 58
494 4 2 123.5 2641.12 91
494 4 4 123.5 2641.12 91
494 4 8 123.5 2641.12 91
494 8 2 61.8 1468.60 107
494 8 4 61.8 1468.60 107
494 8 8 61.8 1468.60 107
494 16 2 30.9 970.34 139
494 16 4 30.9 970.34 139
494 16 8 30.9 970.34 139
494 32 2 15.4 920.82 203
494 32 4 15.4 920.82 203
494 32 8 15.4 920.82 203
Test results for find_applet_by_name
The columns are:
1 number of applets (NUM_APPLETS)
2 number of names that need to be tested (APPNAME_BLOCK_SIZE)
3 number of known offsets (NUM_APPLETS/APPNAME_BLOCK_SIZE)
4 average number of cycles per call
5 bloat in bytes relative to known offsets = 0
13 25 0 224.07 0
13 30 0 224.07 0
13 35 0 224.07 0
13 40 0 224.07 0
13 45 0 224.07 0
13 50 0 224.07 0
28 25 0 462.83 0
28 30 0 462.83 0
28 35 0 462.83 0
28 40 0 462.83 0
28 45 0 462.83 0
28 50 0 462.83 0
48 25 0 840.88 0
48 30 0 840.88 0
48 35 0 840.88 0
48 40 0 840.88 0
48 45 0 840.88 0
48 50 0 840.88 0
72 25 2 703.86 57
72 30 2 703.86 57
72 35 2 703.86 57
72 40 0 1212.37 0
72 45 0 1212.37 0
72 50 0 1212.37 0
96 25 3 678.07 90
96 30 3 678.07 90
96 35 2 941.25 57
96 40 2 941.25 57
96 45 2 941.25 57
96 50 0 1714.42 0
133 25 5 665.84 114
133 30 4 740.55 93
133 35 3 927.43 91
133 40 3 927.43 91
133 45 2 1353.95 57
133 50 2 1353.95 57
172 25 6 745.96 118
172 30 5 829.50 114
172 35 4 951.37 93
172 40 4 951.37 93
172 45 3 1225.35 91
172 50 3 1225.35 91
214 25 8 756.17 109
214 30 7 818.73 122
214 35 6 900.33 118
214 40 5 1028.08 114
214 45 4 1232.25 93
214 50 4 1232.25 93
281 25 11 798.10 138
281 30 9 863.99 130
281 35 8 949.66 109
281 40 7 1062.82 122
281 45 6 1170.01 118
281 50 5 1344.27 114
364 25 14 861.78 148
364 30 12 910.64 140
364 35 10 1016.15 132
364 40 9 1060.04 128
364 45 8 1162.96 107
364 50 7 1291.48 120
494 25 19 937.61 168
494 30 16 970.37 139
494 35 14 1052.08 148
494 40 12 1125.15 140
494 45 10 1272.46 132
494 50 9 1339.46 128
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox