Hi Jim, Here is my detailled answer:
2015-06-17 22:43 GMT+02:00 Jim Graham <james.gra...@oracle.com>: > Hi Laurent, > > There were 1 or 2 very minor fixes I suggested in a couple of files that > you indicated you were going to do, but I didn't see a new webrev. Is > there one? I think the existing array stuff is probably OK and I'll look > at it again now that I understand the tradeoffs of the current design of > it, but I wanted to know if there was a newer webrev for me to review based > on those other minor fixes? > Sorry, I did not make a new webrev: I took time to review again the array cache / reset state code and tried some simplifications (postponed to #3). On 6/12/15 5:45 AM, Laurent Bourgès wrote: > >> A general thought - should we have foo_initial arrays if we are >> getting everything from a cache? >> >> I adopted this approach to maximize performance for the 99% cases: I >> sized properly the initial arrays and they takes ~ 512Kb per >> RendererContext: the ArrayCache (Weak) is only used for extreme cases (a >> bit slower). In march 2013, I started by using the cache for all array >> accesses and it was 30% slower ! >> > > Was the 30% back when we were allocating or clearing all of the arrays? > Now that we only clear what we use, the initial sizes of the arrays > shouldn't matter so much. It would be worth a little extra size overhead > to not have so many special cases and special handling for the "_initial" > arrays if the performance is now essentially identical. > I agree that the lastest webrev added drity caches for float / int so less arrays must be zero-filled before putting them back in the cache. It improved the gap ~ 3% in average: ==> marlin_REF.log <== Threads 4 1 2 4 Pct95 237.848 233.887 238.430 241.226 ==> marlin_NO_INITIALS.log <== Threads 4 1 2 4 Pct95 244.261 241.116 244.028 247.639 However, the performance gap is more important for maps with many shapes: ~ 6% (4T) REF: Test Threads Ops Med Pct95 Avg StdDev Min Max TotalOps [ms/op] dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 762.882 764.781 763.136 0.715 762.219 765.110 25 dc_shp_alllayers_2013-00-30-07-00-47.ser 2 50 768.975 774.134 769.167 4.105 764.517 774.750 50 dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 770.511 775.448 770.448 4.668 765.125 787.473 100 NO_INITIALS: Test Threads Ops Med Pct95 Avg StdDev Min Max TotalOps [ms/op] dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 808.793 810.227 808.959 0.752 807.831 810.871 25 dc_shp_alllayers_2013-00-30-07-00-47.ser 2 50 813.435 815.646 813.357 1.361 811.076 817.711 50 dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 815.775 823.593 817.352 6.752 813.031 872.658 100 A notable difference between initial_arrays and cache consists in GC: initial_arrays are well_sized and stored in the RendererContext directly (soft reference in my benchmarks) but the cache is stored using a weak reference => more GC overhead. I think it possible to encapsulate the initial array into a CacheReference that will handle both cache (appropriate Dirty / Clean ArrayCache) & the initial array (tuned size) to simplify the renderer's code. > > I will look again at these cases: I agree "leaking refs" should be set >> to null to be sure. However, Marlin code is always executing the pattern >> init() / dispose() so leaking pointers are not critical but OK let's set >> all of them to null. >> >> For initial arrays, it seems redundant to set to null in dispose() and >> set to initial in init() = useless and counter-productive ! >> > > It gets easier to review and maintain code if there is a clear reset to > some initial conditions is all. I would like to try to keep the code as > simple and straightforward as possible while evaluating algorithmic > improvements without introducing potential bugs with a lot of special case > code and then we can worry about the 1%'s later. > Agreed: I will make efforts in #3 step: improve / simplify the init / dispose sequence = reset state in an uniform manner. Notably: the Stroker / Dasher dispose() must be called as early as possible from Renderer directly (endRendering and / or dispose) in case of exception. > Also, we may find moving the code to native to be more of a performance > improvement than trying to fight with Hotspot's load/store policies and so > any code that tries to make "slightly more optimal byte codes" would end up > not being needed and would add some noise to any such native conversion. > Up to now I prefer coding java as hotspot does all that optimization for free and it avoids me coding in C++ : error prone and a lot more complex to make it clear. Moreover, I installed and tried JITWatch + hotspot's assembler plugin to let me look at the asm code (+ hints): if I have some free time, I could have a look and see ... > But, I do agree that technically you are correct about some of these > suggestions leading to some provably redundant code. The issue is more > that constantly having to reprove the redundancies gets in the way of some > other progress we could be making. > Ok, I plan to rewrite the dispose code in #3 step. And, on the other hand, right now you are the engineer driving this from > the workload point of view (and I apologize if I am inadvertently in a role > that seems to be described as "the engineer getting in the way"), so I'm > willing to rely on your commitment to be diligent about these > optimizations. I'm just stating a case for whether they should be the > focus yet. > Thanks; if you need, I can help you running MapBench on your side. For me, Marlin is fast enough now (like ductus) to focus on quality: I mentioned proper gamma correction (but text rendering issues) & resampling filters (moire...) or adaptive super sampling. > Sorry it is not a flaw but an advantage: the initial arrays that can >> grow in widenXXXArray() have an odd size as indicated by the comment: >> // +1 to avoid recycling in Helpers.widenArray() >> private final int[] edgePtrs_initial = new int[INITIAL_SMALL_ARRAY >> + 1]; // 4K >> > > I saw that after the fact, but my main point remains that this is a > fragile inter-module optimization. I was unaware that it could save us 30% > to have special-cased "initial" arrays, and that fact makes this > relationship that much more important but I wonder if that is still true. You're right: now it is ~ 5%. However, it also generates more garbage (weak reference) and maybe a bigger memory footprint: so for web servers, it may be a lot worse ! > > If we ditch the initial arrays because the new "dirty array" management > removes that performance penalty then this all goes away. And, even if we > keep the initial array concept it would be nice to find a more reliable way > to manage that than a side effect of its length. > I propose to keep it but use a better Object oriented design = CacheReference (Cache & initial array references) very close to what you proposed so it will handle consistently all the initial array & caching complexity per array type. > I wanted to have a simple but robust API that can be easily used in >> Marlin. >> > > I'd like to think that I'm working/promoting towards simple and robust > APIs as well. > Of course, you are. > > I agree the documentation can be improved a lot and I could introduce a >> new helper method in ArrayCache to create the initial arrays in a safer >> and explicit way : >> >> int[] createNonCachedIntArray(int length) { >> // return only odd length to detect wrong cache usage >> if (length & 1 == 0) { >> length +=1; >> } >> return new int[length]; >> } >> > > In a future effort, if we find that we must keep some small initial arrays > for performance, we could make this a more explicit and tunable parameter > on the caches. Let's table that until we consider if they buy us any > performance after these dirty array changes. > Ok, let's see... > Anyway I agree the cache handling can be improved a lot: try using >> generics for array types with a lambda factory to allocate them. >> > > I just wish that generics could handle primitive arrays, but they can't. > I did a similar generification thing for the raster format conversion code > in FX, but I had to play games with the generic types being ByteBuffer and > IntBuffer and having the variants of those accept the proper type of > primitive arrays. It wasn't pretty, but it did work out in the end. > I know generics do not work but it seems possible to use abstract methods: ArrayCache<K> - <K> create (length) - copy (<K> array, ...) - fill(<K> array) ... implemented by IntArrayCache extends ArrayCache<int[]> - int[] create (length) { return new int[length]; } - copy (int[] array) { System.arraycopy(array ... } However, it may give ugly code ... wrapping all array utility methods by abstract methods (length, fill, copy, ...) If I do not work or is too slow, I could use a code generator to generate all needed variants from a template. Do you have such code generator in openjdk ? > > The caching code is an example of "code with no obvious flaws >> because it is hard to figure out if it has flaws". We should strive >> for "code with obviously no flaws because it is easy to verify that >> it is doing everything correctly". >> >> >> I disagree the term "flaw" but I agree it is hard to figure out what is >> done. >> > > I took that from a talk on producing more secure code. I think it > originally came from an overall system design that could easily be proven > secure. The first release of Java required all sensitive Java code to be > constantly asking "am I in the context that this operation is allowed?" and > the detection was fragile. It was replaced by code where all sensitive > operations were rejected unless the surrounding code proved its protected > context which was much more secure. "Disallowed unless requested" was > harder to make a mistake than "allowed unless it looks questionable". > > But, that same philosophy serves a lot of code well. If you start having > a bunch of "agreements" between modules then they can quickly get out of > synch through regular maintenance and sometimes spiral out of control and > you spend more time verifying that you're still satisfying all of these > conventions rather than letting the code make sure things are kosher > because only valid operations are expressable. > Thanks for the explanations, I totally agree and sometimes I write too complicated code (or fancy). > > OK, I can work later on a new framework: >> - ArrayCache<K> abstract class where <K> is byte[], int[], float[] >> - CleanArrayCache<K> extends ArrayCache<K> = clean arrays >> - DirtyArrayCache<K> extends ArrayCache<K> = dirty arrays >> > > I think I can trust the existing architecture for at least a putback of > the last webrev that you sent out while we braingstorm a better caching > mechanism - let me make one last pass through it and I'll send an OK later > this afternoon. > Perfect. > > What do you mean by "meking further real improvements in the algorithms" ? >> > > Things like investigating DDA in the inner loops, alternate ways to manage > crossings that require less sorting, enhancing the stroker to do > non-uniform scaling (which would eliminate the double transform in some > cases), potentially passing the crossings to the output loop rather than > having to convert them into an alpha array, etc. A lot of that is easier > to review if there aren't a lot of extra "this line added 1% performance" > code adding visual noise. > > Who could help ? I am the only one working concretely on this code, help >> wanted. >> > > I'd love to get more people involved, but history tells us that this is a > fairly lonely field. :( > Too bad: it costed me a lot of evenings in past months so I wonder how long I can afford such intense effort (family, holidays) ! I may have some time now that FX is in tightened bug-fix mode to do some > tinkering, but mainly I will probably be much quicker on getting reviews > done which may help as much as having another pair of hands in practice... > Great news: you will definitely help me a lot by making reviews, giving advices and maybe some code. Laurent