Jim, Here are some news before leaving for a long week end:
> It uses FloatMath.ceil() that internally use the ceil_int() implementation for performance. > I agree it should use directly the ceil_int() to be more explicit. I did it and made some tests with endRendering () disabled to evaluate the complete pipeline including addLine cost: - floating point: 1st map: 35ms/115ms = 30% complex map: 180ms/780ms = 25% - fixed point: 1st: 40ms/120ms complex: 200/800ms I can send you detailled results if you want. As this test only compares impacts of my changes to addLine(), the small slowdown is mainly due to its increased complexity: many ceil / floor + math ops. So making efforts on improving addLine and curve decimation will improve complex map rendering ~ 25% of the rendering time. >> Another technique to try would be to use longs which would involve a 64-bit shift to get the integer part, but there is already a 32-bit shift to add the error overflow anyway. I tried quickly: getLong/putLong but packing/unpacking integers seem slower not faster ~ 3%. I will send you that Renderer variant next week to let you have a look. > I may try as a last chance if removing Unsafe usage is not faster. > I really like this approach as it will remove a lot of code = Unsafe usage + OffHeapEdgeArray + dispose / cleaner thread. > Moreover, hotspot may optimize more such normal array accesses than Unsafe calls (intrinsics); however, it may also introduce array bound checks ... I made that variant too: it is 1% slower than Unsafe but not faster : however, the code is a lot more readable and the performance difference is too small to justify using Unsafe (and I experienced many seg faults while making changes...) I also tested cache line (32 per edge) and page size (4k) alignment without any gain on the Unsafe variant. Probably bound checks are causing the minor slowdown but it is safer. I will send you soon a webrev to let you understand. Few ideas to discuss: 1/ I wonder now if the gridding = ceil (x/y - 0.5) should be done differently: why not apply the offset to - 0.5 to points before curve decimation or adding lines: it may saves a lot of substractions: AddLine (x1,y1,x2,y2) implies 4 substractions whereas lineTo (x2,y2) only needs to adjust the last point. Idem for curve decimation, shifting points may help. - do you know if the breakCurveAndAddLines (quad or cubic) really takes into account the supersampling scale to generate only segments needed and no more ? - I use fixed-point (32.32 + error) as you did but it is maybe too precise: the slope, bumpx and error could be determined from integer coordinates for starting / ending points = ceil (x1 - 0.5), ceil (y - 0.5) directly Any advice ? Cheers, Laurent