Jim, Thanks a lot for your efforts understanding my approach, I advocate it looks tricky as I spent a lot of time optimizing every details during last 2 years. As I reached ductus performance level with a pure java code, I agree other improvements are more interesting than losing few percents.
I will now try being more opened minded to your valuable comments to improve code maintainability without sacrifying too much the performance level. I will answer your previous answer tonight. If you are now OK, can I push the patch including the 2 syntax fixes ? So I need to create a new bug id (to be able to push the changeset) ... Should I wait you merge gr repo with latest jdk changes ? I will then focus on refactoring the array cache (and properly reset state) + optimize the FloatMath (ready). As you wonder if initial arrays are still necessary, I will run tonight some benchmarks with initial arrays = [0]. I think the 30% overhead was a long time ago and should be less (dirty cache). However the cache code represents some overhead = get/grow/put + more resizing (copy) than initial arrays.... More... later. Best regards, Laurent Le 18 juin 2015 01:30, "Jim Graham" <james.gra...@oracle.com> a écrit : > > OK, the new array stuff looks good to go. There is still those 2 minor issues from before, but that's it: > > CollinearSimplifier.getNativeConsumer() => return 0 > FloatMath => parens around (~intpart) at 145. > > I don't need to see a webrev for those changes... > > ...jim > > > On 6/17/15 1:43 PM, Jim Graham wrote: >> >> Hi Laurent, >> >> There were 1 or 2 very minor fixes I suggested in a couple of files that >> you indicated you were going to do, but I didn't see a new webrev. Is >> there one? I think the existing array stuff is probably OK and I'll >> look at it again now that I understand the tradeoffs of the current >> design of it, but I wanted to know if there was a newer webrev for me to >> review based on those other minor fixes? >> >> On 6/12/15 5:45 AM, Laurent Bourgès wrote: >>> >>> A general thought - should we have foo_initial arrays if we are >>> getting everything from a cache? >>> >>> I adopted this approach to maximize performance for the 99% cases: I >>> sized properly the initial arrays and they takes ~ 512Kb per >>> RendererContext: the ArrayCache (Weak) is only used for extreme cases (a >>> bit slower). In march 2013, I started by using the cache for all array >>> accesses and it was 30% slower ! >> >> >> Was the 30% back when we were allocating or clearing all of the arrays? >> Now that we only clear what we use, the initial sizes of the arrays >> shouldn't matter so much. It would be worth a little extra size >> overhead to not have so many special cases and special handling for the >> "_initial" arrays if the performance is now essentially identical. >> >>> I will look again at these cases: I agree "leaking refs" should be set >>> to null to be sure. However, Marlin code is always executing the pattern >>> init() / dispose() so leaking pointers are not critical but OK let's set >>> all of them to null. >>> >>> For initial arrays, it seems redundant to set to null in dispose() and >>> set to initial in init() = useless and counter-productive ! >> >> >> It gets easier to review and maintain code if there is a clear reset to >> some initial conditions is all. I would like to try to keep the code as >> simple and straightforward as possible while evaluating algorithmic >> improvements without introducing potential bugs with a lot of special >> case code and then we can worry about the 1%'s later. >> >> Also, we may find moving the code to native to be more of a performance >> improvement than trying to fight with Hotspot's load/store policies and >> so any code that tries to make "slightly more optimal byte codes" would >> end up not being needed and would add some noise to any such native >> conversion. >> >> But, I do agree that technically you are correct about some of these >> suggestions leading to some provably redundant code. The issue is more >> that constantly having to reprove the redundancies gets in the way of >> some other progress we could be making. >> >> And, on the other hand, right now you are the engineer driving this from >> the workload point of view (and I apologize if I am inadvertently in a >> role that seems to be described as "the engineer getting in the way"), >> so I'm willing to rely on your commitment to be diligent about these >> optimizations. I'm just stating a case for whether they should be the >> focus yet. >> >>> Sorry it is not a flaw but an advantage: the initial arrays that can >>> grow in widenXXXArray() have an odd size as indicated by the comment: >>> // +1 to avoid recycling in Helpers.widenArray() >>> private final int[] edgePtrs_initial = new int[INITIAL_SMALL_ARRAY >>> + 1]; // 4K >> >> >> I saw that after the fact, but my main point remains that this is a >> fragile inter-module optimization. I was unaware that it could save us >> 30% to have special-cased "initial" arrays, and that fact makes this >> relationship that much more important but I wonder if that is still >> true. If we ditch the initial arrays because the new "dirty array" >> management removes that performance penalty then this all goes away. >> And, even if we keep the initial array concept it would be nice to find >> a more reliable way to manage that than a side effect of its length. >> >>> I wanted to have a simple but robust API that can be easily used in >>> Marlin. >> >> >> I'd like to think that I'm working/promoting towards simple and robust >> APIs as well. >> >>> I agree the documentation can be improved a lot and I could introduce a >>> new helper method in ArrayCache to create the initial arrays in a safer >>> and explicit way : >>> >>> int[] createNonCachedIntArray(int length) { >>> // return only odd length to detect wrong cache usage >>> if (length & 1 == 0) { >>> length +=1; >>> } >>> return new int[length]; >>> } >> >> >> In a future effort, if we find that we must keep some small initial >> arrays for performance, we could make this a more explicit and tunable >> parameter on the caches. Let's table that until we consider if they buy >> us any performance after these dirty array changes. >> >>> Anyway I agree the cache handling can be improved a lot: try using >>> generics for array types with a lambda factory to allocate them. >> >> >> I just wish that generics could handle primitive arrays, but they can't. >> I did a similar generification thing for the raster format conversion >> code in FX, but I had to play games with the generic types being >> ByteBuffer and IntBuffer and having the variants of those accept the >> proper type of primitive arrays. It wasn't pretty, but it did work out >> in the end. >> >>> The caching code is an example of "code with no obvious flaws >>> because it is hard to figure out if it has flaws". We should strive >>> for "code with obviously no flaws because it is easy to verify that >>> it is doing everything correctly". >>> >>> >>> I disagree the term "flaw" but I agree it is hard to figure out what is >>> done. >> >> >> I took that from a talk on producing more secure code. I think it >> originally came from an overall system design that could easily be >> proven secure. The first release of Java required all sensitive Java >> code to be constantly asking "am I in the context that this operation is >> allowed?" and the detection was fragile. It was replaced by code where >> all sensitive operations were rejected unless the surrounding code >> proved its protected context which was much more secure. "Disallowed >> unless requested" was harder to make a mistake than "allowed unless it >> looks questionable". >> >> But, that same philosophy serves a lot of code well. If you start >> having a bunch of "agreements" between modules then they can quickly get >> out of synch through regular maintenance and sometimes spiral out of >> control and you spend more time verifying that you're still satisfying >> all of these conventions rather than letting the code make sure things >> are kosher because only valid operations are expressable. >> >>> OK, I can work later on a new framework: >>> - ArrayCache<K> abstract class where <K> is byte[], int[], float[] >>> - CleanArrayCache<K> extends ArrayCache<K> = clean arrays >>> - DirtyArrayCache<K> extends ArrayCache<K> = dirty arrays >> >> >> I think I can trust the existing architecture for at least a putback of >> the last webrev that you sent out while we braingstorm a better caching >> mechanism - let me make one last pass through it and I'll send an OK >> later this afternoon. >> >>> What do you mean by "meking further real improvements in the >>> algorithms" ? >> >> >> Things like investigating DDA in the inner loops, alternate ways to >> manage crossings that require less sorting, enhancing the stroker to do >> non-uniform scaling (which would eliminate the double transform in some >> cases), potentially passing the crossings to the output loop rather than >> having to convert them into an alpha array, etc. A lot of that is >> easier to review if there aren't a lot of extra "this line added 1% >> performance" code adding visual noise. >> >>> Who could help ? I am the only one working concretely on this code, help >>> wanted. >> >> >> I'd love to get more people involved, but history tells us that this is >> a fairly lonely field. :( >> >> I may have some time now that FX is in tightened bug-fix mode to do some >> tinkering, but mainly I will probably be much quicker on getting reviews >> done which may help as much as having another pair of hands in practice... >> >> ...jim