[ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011744#comment-17011744
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15213 at 1/9/20 12:27 PM:
-----------------------------------------------------------------------------

{quote}Fwiw, the proposed fast log code does improve performance of my local 
modifications when benchmarked but basically brings it back inline w/ the 
existing implementation (caveat: benchmarking issues already mentioned).
{quote}
Achieving equivalent performance in this synthetic benchmark is pretty much all 
we need, I think. This should predict a strict performance win once memory 
latency is taken into account, as our largest such array is ~1.2KiB, with 150 
elements, leading to a predicted 5 memory stalls for binary search - although 
execution will proceed speculatively and may mask one or so of those stalls. If 
the approximation can achieve an optimal 1 memory stall in the worst case, with 
similar performance when the entire structure is in cache, it’s a win in my 
book.
{quote}would result in several cases where a linear search would lead to over 
10-50 accesses
{quote}
Interesting, I’m not sure what’s happening in that case. For comparison, I ran 
the code below
{code:java}
long[] offsets = newOffsets(160, true);
for (int i = 0 ; i < offsets.length ; ++i)
{
    long start = i == 0 ? 0 : offsets[i - 1] + 1;
    long end = offsets[i];
    System.out.printf("%d %d %d %d %.1f %.1f %.1f %.1f\n", i, end, 
(int)fastLog12(start) - i, (int)fastLog12(end) - i, fastLog12(start), 
fastLog12(end), slowLog12(start), slowLog12(end));
}
{code}
Which yields the below results, with columns:
 # The index in the offsets array
 # The value associated with the bucket
 # The integer difference between fastLog(lowest) and this offset, where lowest 
is the smallest value we want to bucket at this offset
 # This integer difference between fastLog(highest)
 # fastLog(lowest)
 # fastLog(highest)
 # slowLog(lowest)
 # slowLog(highest)

It looks to me like the approximation is always within precisely 2 or 3 of the 
offset, but _ahead_ by 2 or 3, because the array doesn’t follow a strict 1.2 
logarithm given the first few buckets must increment by whole integers. So for 
values > 2, we can subtract 3 before walking forwards at most 1, incurring only 
a single memory stall. If we wanted to be really clever this could be 
implemented branchlessly, so that this memory latency wouldn’t pause further 
execution, or invalidate the pipeline, only consume some reorder buffer slots 
and register file entries.
{code:java}
0 0 0 0 0.0 0.0 -Infinity -Infinity
1 1 -1 -1 0.0 0.0 0.0 0.0
2 2 1 1 3.8 3.8 3.8 3.8
3 3 3 3 6.0 6.0 6.0 6.0
4 4 3 3 7.6 7.6 7.6 7.6
5 5 3 3 8.8 8.8 8.8 8.8
6 6 3 3 9.8 9.8 9.8 9.8
7 7 3 3 10.7 10.7 10.7 10.7
8 8 3 3 11.4 11.4 11.4 11.4
9 10 3 3 12.1 12.6 12.1 12.6
10 12 3 3 13.2 13.6 13.2 13.6
11 14 3 3 14.1 14.5 14.1 14.5
12 17 2 3 14.9 15.5 14.9 15.5
13 20 2 3 15.9 16.4 15.9 16.4
14 24 2 3 16.7 17.4 16.7 17.4
15 29 2 3 17.7 18.5 17.7 18.5
16 35 2 3 18.7 19.3 18.7 19.5
17 42 2 3 19.7 20.5 19.7 20.5
18 50 2 3 20.5 21.5 20.6 21.5
19 60 2 3 21.5 22.5 21.6 22.5
20 72 2 3 22.5 23.5 22.5 23.5
21 86 2 3 23.5 24.3 23.5 24.4
22 103 2 3 24.3 25.3 24.5 25.4
23 124 2 3 25.5 26.4 25.5 26.4
24 149 2 3 26.4 27.3 26.5 27.4
25 179 2 3 27.3 28.4 27.5 28.5
26 215 2 3 28.4 29.3 28.5 29.5
27 258 2 3 29.5 30.4 29.5 30.5
28 310 2 3 30.4 31.4 30.5 31.5
29 372 2 3 31.4 32.4 31.5 32.5
30 446 2 3 32.4 33.3 32.5 33.5
31 535 2 3 33.3 34.2 33.5 34.5
32 642 2 3 34.2 35.4 34.5 35.5
33 770 2 3 35.4 36.4 35.5 36.5
34 924 2 3 36.4 37.3 36.5 37.5
35 1109 2 3 37.3 38.4 37.5 38.5
36 1331 2 3 38.4 39.2 38.5 39.5
37 1597 2 3 39.2 40.2 39.5 40.5
38 1916 2 3 40.2 41.3 40.5 41.5
39 2299 2 3 41.3 42.2 41.5 42.5
40 2759 2 3 42.2 43.3 42.5 43.5
41 3311 2 3 43.3 44.3 43.5 44.5
42 3973 2 3 44.3 45.4 44.5 45.5
43 4768 2 3 45.4 46.3 45.5 46.5
44 5722 2 3 46.3 47.4 46.5 47.5
45 6866 2 3 47.4 48.3 47.5 48.5
46 8239 2 3 48.3 49.4 48.5 49.5
47 9887 2 3 49.4 50.4 49.5 50.5
48 11864 2 3 50.4 51.4 50.5 51.5
49 14237 2 3 51.4 52.3 51.5 52.5
50 17084 2 3 52.3 53.2 52.5 53.5
51 20501 2 3 53.2 54.4 53.5 54.5
52 24601 2 3 54.4 55.4 54.5 55.5
53 29521 2 3 55.4 56.3 55.5 56.5
54 35425 2 3 56.3 57.4 56.5 57.5
55 42510 2 3 57.4 58.3 57.5 58.5
56 51012 2 3 58.3 59.3 58.5 59.5
57 61214 2 3 59.3 60.3 59.5 60.5
58 73457 2 3 60.3 61.2 60.5 61.5
59 88148 2 3 61.2 62.3 61.5 62.5
60 105778 2 3 62.3 63.3 62.5 63.5
61 126934 2 3 63.3 64.3 63.5 64.5
62 152321 2 3 64.3 65.3 64.5 65.5
63 182785 2 3 65.3 66.4 65.5 66.5
64 219342 2 3 66.4 67.3 66.5 67.5
65 263210 2 3 67.3 68.4 67.5 68.5
66 315852 2 3 68.4 69.4 68.5 69.5
67 379022 2 3 69.4 70.4 69.5 70.5
68 454826 2 3 70.4 71.3 70.5 71.5
69 545791 2 3 71.3 72.2 71.5 72.5
70 654949 2 3 72.2 73.2 72.5 73.5
71 785939 2 3 73.2 74.2 73.5 74.5
72 943127 2 3 74.2 75.3 74.5 75.5
73 1131752 2 3 75.3 76.4 75.5 76.5
74 1358102 2 3 76.4 77.3 76.5 77.5
75 1629722 2 3 77.3 78.3 77.5 78.5
76 1955666 2 3 78.3 79.3 78.5 79.5
77 2346799 2 3 79.3 80.2 79.5 80.5
78 2816159 2 3 80.2 81.3 80.5 81.5
79 3379391 2 3 81.3 82.3 81.5 82.5
80 4055269 2 3 82.3 83.3 82.5 83.5
81 4866323 2 3 83.3 84.3 83.5 84.5
82 5839588 2 3 84.3 85.4 84.5 85.5
83 7007506 2 3 85.4 86.3 85.5 86.5
84 8409007 2 3 86.3 87.4 86.5 87.5
85 10090808 2 3 87.4 88.4 87.5 88.5
86 12108970 2 3 88.4 89.4 88.5 89.5
87 14530764 2 3 89.4 90.3 89.5 90.5
88 17436917 2 3 90.3 91.2 90.5 91.5
89 20924300 2 3 91.2 92.2 91.5 92.5
90 25109160 2 3 92.2 93.2 92.5 93.5
91 30130992 2 3 93.2 94.3 93.5 94.5
92 36157190 2 3 94.3 95.4 94.5 95.5
93 43388628 2 3 95.4 96.3 95.5 96.5
94 52066354 2 3 96.3 97.3 96.5 97.5
95 62479625 2 3 97.3 98.3 97.5 98.5
96 74975550 2 3 98.3 99.2 98.5 99.5
97 89970660 2 3 99.2 100.3 99.5 100.5
98 107964792 2 3 100.3 101.3 100.5 101.5
99 129557750 2 3 101.3 102.3 101.5 102.5
100 155469300 2 3 102.3 103.3 102.5 103.5
101 186563160 2 3 103.3 104.4 103.5 104.5
102 223875792 2 3 104.4 105.3 104.5 105.5
103 268650950 2 3 105.3 106.4 105.5 106.5
104 322381140 2 3 106.4 107.4 106.5 107.5
105 386857368 2 3 107.4 108.4 107.5 108.5
106 464228842 2 3 108.4 109.3 108.5 109.5
107 557074610 2 3 109.3 110.3 109.5 110.5
108 668489532 2 3 110.3 111.2 110.5 111.5
109 802187438 2 3 111.2 112.2 111.5 112.5
110 962624926 2 3 112.2 113.3 112.5 113.5
111 1155149911 2 3 113.3 114.4 113.5 114.5
112 1386179893 2 3 114.4 115.3 114.5 115.5
113 1663415872 2 3 115.3 116.3 115.5 116.5
114 1996099046 2 3 116.3 117.3 116.5 117.5
115 2395318855 2 3 117.3 118.2 117.5 118.5
116 2874382626 2 3 118.2 119.3 118.5 119.5
117 3449259151 2 3 119.3 120.3 119.5 120.5
118 4139110981 2 3 120.3 121.3 120.5 121.5
119 4966933177 2 3 121.3 122.3 121.5 122.5
120 5960319812 2 3 122.3 123.4 122.5 123.5
121 7152383774 2 3 123.4 124.3 123.5 124.5
122 8582860529 2 3 124.3 125.3 124.5 125.5
123 10299432635 2 3 125.3 126.4 125.5 126.5
124 12359319162 2 3 126.4 127.4 126.5 127.5
125 14831182994 2 3 127.4 128.3 127.5 128.5
126 17797419593 2 3 128.3 129.3 128.5 129.5
127 21356903512 2 3 129.3 130.2 129.5 130.5
128 25628284214 2 3 130.2 131.3 130.5 131.5
129 30753941057 2 3 131.3 132.3 131.5 132.5
130 36904729268 2 3 132.3 133.4 132.5 133.5
131 44285675122 2 3 133.4 134.3 133.5 134.5
132 53142810146 2 3 134.3 135.3 134.5 135.5
133 63771372175 2 3 135.3 136.3 135.5 136.5
134 76525646610 2 3 136.3 137.2 136.5 137.5
135 91830775932 2 3 137.2 138.4 137.5 138.5
136 110196931118 2 3 138.4 139.3 138.5 139.5
137 132236317342 2 3 139.3 140.3 139.5 140.5
138 158683580810 2 3 140.3 141.3 140.5 141.5
139 190420296972 2 3 141.3 142.4 141.5 142.5
140 228504356366 2 3 142.4 143.3 142.5 143.5
141 274205227639 2 3 143.3 144.3 143.5 144.5
142 329046273167 2 3 144.3 145.4 144.5 145.5
143 394855527800 2 3 145.4 146.2 145.5 146.5
144 473826633360 2 3 146.2 147.3 146.5 147.5
145 568591960032 2 3 147.3 148.3 147.5 148.5
146 682310352038 2 3 148.3 149.2 148.5 149.5
147 818772422446 2 3 149.2 150.3 149.5 150.5
148 982526906935 2 3 150.3 151.3 150.5 151.5
149 1179032288322 2 3 151.3 152.4 151.5 152.5
150 1414838745986 2 3 152.4 153.3 152.5 153.5
151 1697806495183 2 3 153.3 154.3 153.5 154.5
152 2037367794220 2 3 154.3 155.3 154.5 155.5
153 2444841353064 2 3 155.3 156.2 155.5 156.5
154 2933809623677 2 3 156.2 157.4 156.5 157.5
155 3520571548412 2 3 157.4 158.3 157.5 158.5
156 4224685858094 2 3 158.3 159.3 158.5 159.5
157 5069623029713 2 3 159.3 160.3 159.5 160.5
158 6083547635656 2 3 160.3 161.4 160.5 161.5
159 7300257162787 2 3 161.4 162.3 161.5 162.5
160 8760308595344 2 3 162.3 163.3 162.5 163.5
{code}


was (Author: benedict):
{quote}Fwiw, the proposed fast log code does improve performance of my local 
modifications when benchmarked but basically brings it back inline w/ the 
existing implementation (caveat: benchmarking issues already mentioned).
{quote}
Achieving equivalent performance in this synthetic benchmark is pretty much all 
we need, I think. This should predict a strict performance win once memory 
latency is taken into account, as our largest such array is ~1.2KiB, with 150 
elements, leading to a predicted 7 memory stalls for binary search - although 
execution will proceed speculatively and may mask one or so of those stalls. If 
the approximation can achieve an optimal 1 memory stall in the worst case, with 
similar performance when the entire structure is in cache, it’s a win in my 
book.
{quote}would result in several cases where a linear search would lead to over 
10-50 accesses
{quote}
Interesting, I’m not sure what’s happening in that case. For comparison, I ran 
the code below
{code:java}
long[] offsets = newOffsets(160, true);
for (int i = 0 ; i < offsets.length ; ++i)
{
    long start = i == 0 ? 0 : offsets[i - 1] + 1;
    long end = offsets[i];
    System.out.printf("%d %d %d %d %.1f %.1f %.1f %.1f\n", i, end, 
(int)fastLog12(start) - i, (int)fastLog12(end) - i, fastLog12(start), 
fastLog12(end), slowLog12(start), slowLog12(end));
}
{code}
Which yields the below results, with columns:
 # The index in the offsets array
 # The value associated with the bucket
 # The integer difference between fastLog(lowest) and this offset, where lowest 
is the smallest value we want to bucket at this offset
 # This integer difference between fastLog(highest)
 # fastLog(lowest)
 # fastLog(highest)
 # slowLog(lowest)
 # slowLog(highest)

It looks to me like the approximation is always within precisely 2 or 3 of the 
offset, but _ahead_ by 2 or 3, because the array doesn’t follow a strict 1.2 
logarithm given the first few buckets must increment by whole integers. So for 
values > 2, we can subtract 3 before walking forwards at most 1, incurring only 
a single memory stall. If we wanted to be really clever this could be 
implemented branchlessly, so that this memory latency wouldn’t pause further 
execution, or invalidate the pipeline, only consume some reorder buffer slots 
and register file entries.
{code:java}
0 0 0 0 0.0 0.0 -Infinity -Infinity
1 1 -1 -1 0.0 0.0 0.0 0.0
2 2 1 1 3.8 3.8 3.8 3.8
3 3 3 3 6.0 6.0 6.0 6.0
4 4 3 3 7.6 7.6 7.6 7.6
5 5 3 3 8.8 8.8 8.8 8.8
6 6 3 3 9.8 9.8 9.8 9.8
7 7 3 3 10.7 10.7 10.7 10.7
8 8 3 3 11.4 11.4 11.4 11.4
9 10 3 3 12.1 12.6 12.1 12.6
10 12 3 3 13.2 13.6 13.2 13.6
11 14 3 3 14.1 14.5 14.1 14.5
12 17 2 3 14.9 15.5 14.9 15.5
13 20 2 3 15.9 16.4 15.9 16.4
14 24 2 3 16.7 17.4 16.7 17.4
15 29 2 3 17.7 18.5 17.7 18.5
16 35 2 3 18.7 19.3 18.7 19.5
17 42 2 3 19.7 20.5 19.7 20.5
18 50 2 3 20.5 21.5 20.6 21.5
19 60 2 3 21.5 22.5 21.6 22.5
20 72 2 3 22.5 23.5 22.5 23.5
21 86 2 3 23.5 24.3 23.5 24.4
22 103 2 3 24.3 25.3 24.5 25.4
23 124 2 3 25.5 26.4 25.5 26.4
24 149 2 3 26.4 27.3 26.5 27.4
25 179 2 3 27.3 28.4 27.5 28.5
26 215 2 3 28.4 29.3 28.5 29.5
27 258 2 3 29.5 30.4 29.5 30.5
28 310 2 3 30.4 31.4 30.5 31.5
29 372 2 3 31.4 32.4 31.5 32.5
30 446 2 3 32.4 33.3 32.5 33.5
31 535 2 3 33.3 34.2 33.5 34.5
32 642 2 3 34.2 35.4 34.5 35.5
33 770 2 3 35.4 36.4 35.5 36.5
34 924 2 3 36.4 37.3 36.5 37.5
35 1109 2 3 37.3 38.4 37.5 38.5
36 1331 2 3 38.4 39.2 38.5 39.5
37 1597 2 3 39.2 40.2 39.5 40.5
38 1916 2 3 40.2 41.3 40.5 41.5
39 2299 2 3 41.3 42.2 41.5 42.5
40 2759 2 3 42.2 43.3 42.5 43.5
41 3311 2 3 43.3 44.3 43.5 44.5
42 3973 2 3 44.3 45.4 44.5 45.5
43 4768 2 3 45.4 46.3 45.5 46.5
44 5722 2 3 46.3 47.4 46.5 47.5
45 6866 2 3 47.4 48.3 47.5 48.5
46 8239 2 3 48.3 49.4 48.5 49.5
47 9887 2 3 49.4 50.4 49.5 50.5
48 11864 2 3 50.4 51.4 50.5 51.5
49 14237 2 3 51.4 52.3 51.5 52.5
50 17084 2 3 52.3 53.2 52.5 53.5
51 20501 2 3 53.2 54.4 53.5 54.5
52 24601 2 3 54.4 55.4 54.5 55.5
53 29521 2 3 55.4 56.3 55.5 56.5
54 35425 2 3 56.3 57.4 56.5 57.5
55 42510 2 3 57.4 58.3 57.5 58.5
56 51012 2 3 58.3 59.3 58.5 59.5
57 61214 2 3 59.3 60.3 59.5 60.5
58 73457 2 3 60.3 61.2 60.5 61.5
59 88148 2 3 61.2 62.3 61.5 62.5
60 105778 2 3 62.3 63.3 62.5 63.5
61 126934 2 3 63.3 64.3 63.5 64.5
62 152321 2 3 64.3 65.3 64.5 65.5
63 182785 2 3 65.3 66.4 65.5 66.5
64 219342 2 3 66.4 67.3 66.5 67.5
65 263210 2 3 67.3 68.4 67.5 68.5
66 315852 2 3 68.4 69.4 68.5 69.5
67 379022 2 3 69.4 70.4 69.5 70.5
68 454826 2 3 70.4 71.3 70.5 71.5
69 545791 2 3 71.3 72.2 71.5 72.5
70 654949 2 3 72.2 73.2 72.5 73.5
71 785939 2 3 73.2 74.2 73.5 74.5
72 943127 2 3 74.2 75.3 74.5 75.5
73 1131752 2 3 75.3 76.4 75.5 76.5
74 1358102 2 3 76.4 77.3 76.5 77.5
75 1629722 2 3 77.3 78.3 77.5 78.5
76 1955666 2 3 78.3 79.3 78.5 79.5
77 2346799 2 3 79.3 80.2 79.5 80.5
78 2816159 2 3 80.2 81.3 80.5 81.5
79 3379391 2 3 81.3 82.3 81.5 82.5
80 4055269 2 3 82.3 83.3 82.5 83.5
81 4866323 2 3 83.3 84.3 83.5 84.5
82 5839588 2 3 84.3 85.4 84.5 85.5
83 7007506 2 3 85.4 86.3 85.5 86.5
84 8409007 2 3 86.3 87.4 86.5 87.5
85 10090808 2 3 87.4 88.4 87.5 88.5
86 12108970 2 3 88.4 89.4 88.5 89.5
87 14530764 2 3 89.4 90.3 89.5 90.5
88 17436917 2 3 90.3 91.2 90.5 91.5
89 20924300 2 3 91.2 92.2 91.5 92.5
90 25109160 2 3 92.2 93.2 92.5 93.5
91 30130992 2 3 93.2 94.3 93.5 94.5
92 36157190 2 3 94.3 95.4 94.5 95.5
93 43388628 2 3 95.4 96.3 95.5 96.5
94 52066354 2 3 96.3 97.3 96.5 97.5
95 62479625 2 3 97.3 98.3 97.5 98.5
96 74975550 2 3 98.3 99.2 98.5 99.5
97 89970660 2 3 99.2 100.3 99.5 100.5
98 107964792 2 3 100.3 101.3 100.5 101.5
99 129557750 2 3 101.3 102.3 101.5 102.5
100 155469300 2 3 102.3 103.3 102.5 103.5
101 186563160 2 3 103.3 104.4 103.5 104.5
102 223875792 2 3 104.4 105.3 104.5 105.5
103 268650950 2 3 105.3 106.4 105.5 106.5
104 322381140 2 3 106.4 107.4 106.5 107.5
105 386857368 2 3 107.4 108.4 107.5 108.5
106 464228842 2 3 108.4 109.3 108.5 109.5
107 557074610 2 3 109.3 110.3 109.5 110.5
108 668489532 2 3 110.3 111.2 110.5 111.5
109 802187438 2 3 111.2 112.2 111.5 112.5
110 962624926 2 3 112.2 113.3 112.5 113.5
111 1155149911 2 3 113.3 114.4 113.5 114.5
112 1386179893 2 3 114.4 115.3 114.5 115.5
113 1663415872 2 3 115.3 116.3 115.5 116.5
114 1996099046 2 3 116.3 117.3 116.5 117.5
115 2395318855 2 3 117.3 118.2 117.5 118.5
116 2874382626 2 3 118.2 119.3 118.5 119.5
117 3449259151 2 3 119.3 120.3 119.5 120.5
118 4139110981 2 3 120.3 121.3 120.5 121.5
119 4966933177 2 3 121.3 122.3 121.5 122.5
120 5960319812 2 3 122.3 123.4 122.5 123.5
121 7152383774 2 3 123.4 124.3 123.5 124.5
122 8582860529 2 3 124.3 125.3 124.5 125.5
123 10299432635 2 3 125.3 126.4 125.5 126.5
124 12359319162 2 3 126.4 127.4 126.5 127.5
125 14831182994 2 3 127.4 128.3 127.5 128.5
126 17797419593 2 3 128.3 129.3 128.5 129.5
127 21356903512 2 3 129.3 130.2 129.5 130.5
128 25628284214 2 3 130.2 131.3 130.5 131.5
129 30753941057 2 3 131.3 132.3 131.5 132.5
130 36904729268 2 3 132.3 133.4 132.5 133.5
131 44285675122 2 3 133.4 134.3 133.5 134.5
132 53142810146 2 3 134.3 135.3 134.5 135.5
133 63771372175 2 3 135.3 136.3 135.5 136.5
134 76525646610 2 3 136.3 137.2 136.5 137.5
135 91830775932 2 3 137.2 138.4 137.5 138.5
136 110196931118 2 3 138.4 139.3 138.5 139.5
137 132236317342 2 3 139.3 140.3 139.5 140.5
138 158683580810 2 3 140.3 141.3 140.5 141.5
139 190420296972 2 3 141.3 142.4 141.5 142.5
140 228504356366 2 3 142.4 143.3 142.5 143.5
141 274205227639 2 3 143.3 144.3 143.5 144.5
142 329046273167 2 3 144.3 145.4 144.5 145.5
143 394855527800 2 3 145.4 146.2 145.5 146.5
144 473826633360 2 3 146.2 147.3 146.5 147.5
145 568591960032 2 3 147.3 148.3 147.5 148.5
146 682310352038 2 3 148.3 149.2 148.5 149.5
147 818772422446 2 3 149.2 150.3 149.5 150.5
148 982526906935 2 3 150.3 151.3 150.5 151.5
149 1179032288322 2 3 151.3 152.4 151.5 152.5
150 1414838745986 2 3 152.4 153.3 152.5 153.5
151 1697806495183 2 3 153.3 154.3 153.5 154.5
152 2037367794220 2 3 154.3 155.3 154.5 155.5
153 2444841353064 2 3 155.3 156.2 155.5 156.5
154 2933809623677 2 3 156.2 157.4 156.5 157.5
155 3520571548412 2 3 157.4 158.3 157.5 158.5
156 4224685858094 2 3 158.3 159.3 158.5 159.5
157 5069623029713 2 3 159.3 160.3 159.5 160.5
158 6083547635656 2 3 160.3 161.4 160.5 161.5
159 7300257162787 2 3 161.4 162.3 161.5 162.5
160 8760308595344 2 3 162.3 163.3 162.5 163.5
{code}

> DecayingEstimatedHistogramReservoir Inefficiencies
> --------------------------------------------------
>
>                 Key: CASSANDRA-15213
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Observability/Metrics
>            Reporter: Benedict Elliott Smith
>            Assignee: Jordan West
>            Priority: Normal
>             Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to