Phil, I looked at the hostpot in src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c (75% cpu time) and its sort algorithm looks like an insertion sort ... If you could give me some explanations (or documentation), I could try optimizing this method.
Do you know if it uses an Active Edge Table (AET) or it traverses all segments every time ? i.e. segmentTable contains only ACTIVE segments or all ? static jboolean ShapeSINextSpan(void *state, jint spanbox[]) { pathData *pd = (pathData *)state; int lo, cur, new, hi; int num = pd->numSegments; jint x0, x1, y0, err; jint loy; int ret = JNI_FALSE; segmentData **segmentTable; segmentData *seg; if (pd->state != STATE_SPAN_STARTED) { if (!initSegmentTable(pd)) { /* REMIND: - throw exception? */ pd->lowSegment = num; return JNI_FALSE; } } lo = pd->lowSegment; cur = pd->curSegment; hi = pd->hiSegment; num = pd->numSegments; loy = pd->loy; segmentTable = pd->segmentTable; while (lo < num) { if (cur < hi) { seg = segmentTable[cur]; x0 = seg->curx; if (x0 >= pd->hix) { cur = hi; continue; } if (x0 < pd->lox) { x0 = pd->lox; } if (pd->evenodd) { cur += 2; if (cur <= hi) { x1 = segmentTable[cur - 1]->curx; } else { x1 = pd->hix; } } else { int wind = seg->windDir; cur++; while (JNI_TRUE) { if (cur >= hi) { x1 = pd->hix; break; } seg = segmentTable[cur++]; wind += seg->windDir; if (wind == 0) { x1 = seg->curx; break; } } } if (x1 > pd->hix) { x1 = pd->hix; } if (x1 <= x0) { continue; } spanbox[0] = x0; spanbox[1] = loy; spanbox[2] = x1; spanbox[3] = loy + 1; ret = JNI_TRUE; break; } if (++loy >= pd->hiy) { lo = cur = hi = num; break; } /* Go through active segments and toss which end "above" loy */ cur = new = hi; while (--cur >= lo) { seg = segmentTable[cur]; if (seg->lasty > loy) { segmentTable[--new] = seg; } } lo = new; if (lo == hi && lo < num) { /* The current list of segments is empty so we need to * jump to the beginning of the next set of segments. * Since the segments are not clipped to the output * area we need to make sure we don't jump "backwards" */ seg = segmentTable[lo]; if (loy < seg->cury) { loy = seg->cury; } } /* Go through new segments and accept any which start "above" loy */ while (hi < num && segmentTable[hi]->cury <= loy) { hi++; } /* Update and sort the active segments by x0 */ for (cur = lo; cur < hi; cur++) { seg = segmentTable[cur]; /* First update the x0, y0 of the segment */ x0 = seg->curx; y0 = seg->cury; err = seg->error; if (++y0 == loy) { x0 += seg->bumpx; err += seg->bumperr; x0 -= (err >> 31); err &= ERRSTEP_MAX; } else { jlong steps = loy; steps -= y0 - 1; y0 = loy; x0 += (jint) (steps * seg->bumpx); steps = err + (steps * seg->bumperr); x0 += (jint) (steps >> 31); err = ((jint) steps) & ERRSTEP_MAX; } seg->curx = x0; seg->cury = y0; seg->error = err; * /* Then make sure the segment is sorted by x0 */ for (new = cur; new > lo; new--) { segmentData *seg2 = segmentTable[new - 1]; if (seg2->curx <= x0) { break; } segmentTable[new] = seg2; }* segmentTable[new] = seg; } cur = lo; } pd->lowSegment = lo; pd->hiSegment = hi; pd->curSegment = cur; pd->loy = loy; return ret; } Cheers, Laurent Le ven. 12 oct. 2018 à 07:04, Laurent Bourgès <bourges.laur...@gmail.com> a écrit : > Phil, > > It reminds me I have rewritten in Marlin renderer the crossing sort at > every scanline. > Pisces was using a trivial insertion sort but it became very slow when the > crossing count is large. > I adopted a special merge sort as crossings are mostly ordered: big win. > > What sort algo is in action in drawPolyLine ? > > Cheers, > Laurent > > Le ven. 12 oct. 2018 à 01:15, Phil Race <philip.r...@oracle.com> a écrit : > >> >> In my previous email I was asking only about the "older" system, >> precisely because as you confirm below, I wanted to know that >> it was operating on an unscaled graphics. >> >> It is being triggered by the scale. If you add : >> graphics.scale(1.25, 1.25) >> in your application and run on 8 you'll see the window size is >> not changed but the contents are and the test now runs slowly >> like the JDK 9+ case. >> >> I think most primitives (text, images, fills, gradients, untransformed >> rectangle drawing) will be only slightly slower. The same if >> you were drawing anti-aliased lines - they are going to be slow already >> by comparison. >> >> A few similar primitives (drawArc, drawOval, drawPolygon ..) may be >> similarly >> affected but drawPolyLine even has dedicated loops for single pixel wide >> lines >> so may be the most affected when these loops can't be used. >> >> So this is a kind of worse case difference. Untransformed, aliased lines >> are super fast. >> Once you do anything like add anti-aliasing or a transform, they get >> slower. >> >> Note: hidpi does not mean that acceleration is "turned off", rather that >> some operations can no longer be sent to the accelerated pipeline, either >> it doesn't support that mode, or we haven't implemented the necessary code >> to invoke it for that mode. >> >> In Peter's case on Intel there will be no acceleration, since we do not >> enable D3D >> on Intel graphics cards. But on my system the time is identical whether >> I use D3D or not. >> >> But there is something else going on here too. >> Peter's test use 2^16 line segments. >> >> On my windows system at 1.25 scale, this takes 55 seconds to run. >> But 2^15 line segments completes in 10 seconds. >> So 2 x the no. of lines takes approx 5 times as long to run .. >> >> I have a modified version of Peter's program which breaks the polyline >> array >> into subarrays which get passed to multiple calls to drawPolyline. >> It misses joining the last point in ARR[N] to the first point in >> ARR[N+1] but >> I think that should not make much difference but if someone wants to use >> that in a real app they'll need to handle it. >> >> What I see is that using the smaller arrays makes a big difference. >> >> So instead of 60 seconds to draw one 65,536 element polyline, to >> draw 64 polylines of 1,024 elements takes just 1.1 seconds. >> Still not 0.05 but better. >> From what I can see it is being turned into a huge GeneralPath and >> rendered as a Shape. Multiple smaller shapes perform better. >> Perhaps we can add a loop that is specific to polygons that will handle >> this better, but that isn't likely to be jumped on .. and obviously >> it will never be *as* fast as narrow lines. >> >> -phil. >> >> >> >> On 10/11/2018 04:26 AM, Peter Hull wrote: >> > Hi Laurent, >> > Thanks for the detailed explanation. I quickly checked on the older >> > Windows system and the Java 11 window was the same size as the Java 8 >> > one, implying no scaling was going on (I guess just because it has a >> > lower resolution monitor) - so that confirms your hypothesis. >> > >> > If I use -Dsun.java2d.uiScale=1.0 that's OK for my laptop, it doesn't >> > matter if the window is a bit small. However I believe some higher end >> > systems have much higher scaling factors (2x, 3x?). Is there a general >> > way to specify a 1px line regardless of scaling, because in my case I >> > don't mind too much if it's a 'hair-line'? >> > >> > By the way, my actual application doesn't have 65000 lines but it >> > draws 3 graphs with about 3000 points, which makes it noticeably slow >> > when resizing the Window. I suppose I should look into cutting down >> > the number of points somehow... >> > >> > Pete >> >> -- -- Laurent Bourgès