Jim, Andrea & java2d members, I am happy to announce an updated Pisces patch that is faster again:
- Patched Pisces vs OpenJDK Pisces (ref): ~ 2.5 to 4.5 times faster score small *1* 20 248,04% 247,90% 464,65% 248,04% *253,49%* 232,64% 207,77% *2* 40 276,49% 276,09% 1317,15% 279,32% *308,52%* 251,96% 288,31% *4* 80 295,18% 295,49% 629,06% 298,08% *316,24%* 269,51% 181,64% * * * * score big *1* 20 356,13% 356,44% 1862,18% 356,47% *360,04%* 345,63% 360,26% *2* 40 413,56% 414,14% 350,96% 414,06% *411,88%* 412,23% 385,51% *4* 80 458,96% 459,48% 941,17% 459,68% *467,40%* 425,12% 450,10% - Patched Pisces vs Oracle JDK 8 (ductus): ~ equal (1T) ~ 60% faster (2T) ~ 2 to 3 times faster (4T) score small *1* 20 94,02% 93,58% 61,96% 93,53% *92,77%* 93,69% 128,83% * 2* 40 138,06% 137,95% 763,67% 140,09% *157,44%* 102,14% 183,03% *4* 80 179,10% 179,17% 494,78% 182,03% *198,80%* 119,86% 176,89% * * * * score big *1* 20 122,67% 122,69% 112,98% 122,69% *122,67%* 122,70% 122,23% *2* 40 173,02% 173,17% 335,41% 173,50% *178,99%* 160,51% 175,63% *4* 80 325,52% 326,50% 574,24% 326,59% *330,57%* 226,20% 321,69% JAVA_OPTS="-server -XX:+PrintCommandLineFlags -XX:-PrintFlagsFinal -XX:-TieredCompilation " JAVA_TUNING=" -Xms128m -Xmx128m" Full results: http://jmmc.fr/~bourgesl/share/java2d-pisces/compareRef_Patch_2.ods http://jmmc.fr/~bourgesl/share/java2d-pisces/patch_opt_05_05_20s.log http://jmmc.fr/~bourgesl/share/java2d-pisces/ductus_tests_10s.log http://jmmc.fr/~bourgesl/share/java2d-pisces/ref_test_long.log Here is the updated pisces patch: http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-4/ Changes: - PiscesCache: use rowAAStride[32][x0; x1; alpha sum(x)] to use alpha data directly instead of encoding / decoding RLE data - fixed PiscesTileGenerator.getAlpha() to use directly and optimally rowAAStride - Renderer: edges array split into edges [CURX, SLOPE] / edgesInt [NEXT, YMAX, OR] to avoid float / int conversions - added "monitors" ie custom cpu / stats probes to gather usage statistics and cpu timings (minimal overhead): enable them using PiscesConst.doMonitors flag - minor tweaks Remaining tasks: - basic clipping algorithm to handle trivial shape or line rejection when no affine transform or simple one (scaling) is in use - enhance curve / round caps and joins handling to take into account the spatial resolution: for example, round caps representing less than 2 AA pixels are visually useless and counter productive (cpu): to be discussed - cleanup / indentation STILL IN PROGRESS Jim, I found few bugs / mistakes related to bbox + 1 (piscesCache) and alpha array (+ 1): I agree pixel coordinates / edges / crossing should be converted to integers in an uniform manner (consistent and more accurate) : to be discusses and remain to be fixed. Finally I updated MapBench & MapDisplay: http://jmmc.fr/~bourgesl/share/java2d-pisces/MapBench/ New features: - each test runs at least 5s (configurable as first CLI arg) to ensure enough test runs to compute accurate average and statistics - perform scaling / translation tests (affineTransform) (clipping tests in progress) - flush monitors after each test Jim, few comments below: 2013/5/4 Jim Graham <james.gra...@oracle.com> > > I am perplex and I am going to check pisces code against your given >> approach. >> > > If for no other reason that to make sure that there aren't two parts of > the system trying to communicate with different philosophies. You don't > want a caller to hand a closed interval to a method which treats the values > as half-open, for instance. If the rounding is "different, but > consistent", then I think we can leave it for now and treat it as a future > refinement to check if it makes any practical difference and correct. But, > if it shows up a left-hand-not-talking-to-**right-hand bug, then that > would be good to fix sooner rather than later. > As said before, minor bugs: - alpha array (Renderer) handling seems going over its upper limit: I need to clear it until pix_to + 1 + 1 (pix_to inclusive) ! - edge / crossing coordinate rounding: fix bias to 0.5 => ceil (x - 0.5) > I think it is OK to focus on your current task of performance and memory > turmoil, but I wanted to give you the proper background to try to > understand what you were reading primarily, and possibly to get you > interested in cleaning up the code as you went as a secondary consideration. Agreed. Could you explain me a bit the renderer's scan line algorithm related to crossing in next() and _endRendering methods ? > If every coordinate has already been biased by the -0.5 then ceil is >> just the tail end of the rounding equation I gave above. >> >> >> That's not the case => buggy: x1, y1 and x2, y2 are directly the point >> coordinates as float values. >> > > > Then using the ceil() on both is still consistent with half-open > intervals, it just has a different interpretation of where the sampling > cut-off lies within the subpixel sample. When you determine where the > "crossings" lie, then it would be proper to do the same ceil(y +/- some > offset) operation to compute the first crossing that is included and the > first crossing that is excluded. Ok. > In this case it appears that the offset if just 0.0 which doesn't really > meet my expectations, but is a minor issue. These crossings then become a > half-open interval of scanline indices in which to compute the value. To be fixed soon. > > I think rounding errors can lead to pixel / shape rasterization >> deformations ... ? >> > > > As long as the test is "y < _edges[ptr+YMAX]" then that is consistent with > a half-open interval sampled at the top of every sub-pixel region, isn't > it? Ok. > I agree with the half-open part of it, but would have preferred a "center > of sub-pixel" offset for the actual sampling. Again. > I am a bit embarrassed to verify maths performed in >> ScanLineIterator.next() which use edges and edgeBucketCounts arrays ... >> could you have a look ? >> Apparently, it uses the following for loop that respects the semi-open >> interval philosophy: >> for (int i = 0, ecur, j, k; i < count; i++) { >> ... >> > > I'll come back to that at a later time, but it sounds like you are > starting to get a handle on the design here. Thanks. > > boolean endRendering() { >> // TODO: perform shape clipping to avoid dealing with >> segments >> out of bounding box >> >> // Ensure shape edges are within bbox: >> if (edgeMinX > edgeMaxX || edgeMaxX < 0f) { >> return false; // undefined X bounds or negative Xmax >> } >> if (edgeMinY > edgeMaxY || edgeMaxY < 0f) { >> return false; // undefined Y bounds or negative Ymax >> } >> >> >> I'd use min >= max since if min==max then I think nothing gets >> generated as a result of all edges having both the in and out >> crossings on the same coordinate. Also, why not test against the >> clip bounds instead? The code after that will clip the edgeMinMaxXY >> values against the boundsMinMax values. If you do this kind of test >> after that clipping is done (on spminmaxxy) then you can discover if >> all of the coordinates are outside the clip or the region of interest. >> >> >> I tried here to perform few "fast" checks before doing float to int >> conversions (costly because millions are performed): I think it can be >> still improved: edgeMinX > edgeMaxX only ensures edgeMinX is defined and >> both are positive ! >> > > endRendering is called once per shape. I don't think moving tests above > its conversions to int will affect our throughput compared to calculations > done per-vertex. > Agreed but having a small gain for each shape is still interesting when thousands or millions of shapes are rendered ! > > I'm going to have to review the rest of this email at a future time, my > apologies... Looking forward reading you soon and getting more feedback on last changes. Regards, Laurent