Sorry Greg, I'll use these jars next week. Didn't get to it... ________________________________ From: Greg Dove <[email protected]> Sent: Friday, May 15, 2026 3:35 AM To: [email protected] <[email protected]> Subject: Re: Non-deterministic output issues
Hi Josh, I don't yet have what I feel is a definitive justification that the 'fix' works, because testers have not had enough time with it. But for those that have been using it, thus far they have not encountered the problem. So I am not making a PR yet, because I would like to be more confident that this at least fixes it - that might take a few more days. However I did push the commits to the following branch: https://github.com/apache/royale-compiler/tree/local_var_resolution_issue in case you want to look at it. The symptom was a rare failure to find the correct type of a local var inside a function scope. Usually the first one worked, but others did not. The assumption for how it occurs is: 1.Memory Pressure: A GC event clears the FunctionNode but leaves the FunctionScope intact. 2.Scope Reconnection: When the FileNode is re-parsed, it creates a new "shell" FunctionNode. The existing FunctionScope is reconnected to this new node. 3.The "Wipe": To ensure no stale data remains, the reconnection process explicitly wipes all local variable definitions from the scope. 4.The Failure: Under certain conditions, the compiler would attempt to resolve a name (look up a variable) after the wipe but before the function body was fully re-parsed. Because the scope was empty, the resolution would either return null (untyped) or incorrectly find something with the same name in an outer scope (e.g., a class member). The general 'solution' is intended as a sort of "just-in-time" restoration of local definitions by ensuring that the act of asking for a definition triggers the re-parsing of the body if it is missing Because it is so hard to repro, the sequence 1-4 above is still an assumed cause, which is why I need to wait for confirmation of it not happening any more from others who were seeing it more routinely than I did. But it does match the symptoms exactly, which makes me hopeful. There is a new test in there (NameResolutionAfterGCTest) that is intended to simulate something close to the above. There are a few other changes in the branch - changes related to cache, and also (something I discovered as another 'rare' thing after the name lookup changes, when running regular royale framework build) to thread-safety with metatag Array - that might need more work/attention. So far I see no noticeable adverse effect to performance. There are a few other minor changes I used during logging that I left in there as well. Anyhow, feel free to come up with a better way to do this if it is obvious to you (assuming it is confirmed to be the cause of the problem - I hope it is). Thanks -Greg On Thu, May 14, 2026 at 8:34 AM Greg Dove <[email protected]> wrote: > > Sounds good Josh, I will do that. I haven't heard feedback yet from day > one of others testing the patched compiler. I will wait one more day, > (still with my fingers crossed) before I even assume that it works. > > > On Thu, May 14, 2026 at 4:48 AM Josh Tynjala <[email protected]> > wrote: > >> Thanks, Greg. I guess a branch is good, in case I'd like to suggest any >> tweaks. >> >> -- >> Josh Tynjala >> Bowler Hat LLC >> https://bowlerhat.dev/ >> >> >> On Tue, May 12, 2026 at 10:10 PM Greg Dove <[email protected]> wrote: >> >> > Josh, I *think* it might be a combination of the two. I'm asking others >> who >> > were seeing it more often than I did to test a possible fix (I will >> share >> > an updated compiler build with them), because repro of the actual issue >> is >> > still quite challenging. >> > *if* it works (maybe will need 1-2 days to be sure), I'll push it either >> > directly to dev or via branch/PR (lmk what you prefer) and I'd certainly >> > appreciate your review of that if possible. I did use a bit of AI >> support >> > for the sleuthing and the testing, but you have spent much more time in >> the >> > compiler codebase than I have. >> > For now, though, fingers crossed.... >> > >> > >> > >> > On Wed, May 13, 2026 at 4:00 AM Josh Tynjala <[email protected] >> > >> > wrote: >> > >> > > Thanks for the update, Greg. Threading could certainly be a cause if >> we >> > > arre missing some kind of synchronization. I know that we have >> > > workspace.startBuilding() and workspace.startIdleState() as ways of >> > > ensuring threads are under control. We may be missing one of those >> calls >> > > somewhere before emitting JS. >> > > >> > > As for GC, I recall that reducing JVM memory wasn't necessarily enough >> > for >> > > me to reproduce the other GC related bug I mentioned, strange as that >> > > seems. I remember also adding System.gc() calls in various places >> > (though I >> > > don't remember exactly where), and I think that's what finally >> allowed me >> > > to reproduce the issue semi-reliably. >> > > >> > > -- >> > > Josh Tynjala >> > > Bowler Hat LLC >> > > https://bowlerhat.dev/ >> > > >> > > >> > > On Mon, May 11, 2026 at 10:15 PM Greg Dove <[email protected]> >> wrote: >> > > >> > > > Hey Josh, >> > > > >> > > > I am actively looking into this again. I am less convinced that it >> is >> > GC >> > > > related (I reduced memory allocation to low levels) and perhaps it >> is >> > > more >> > > > to do with threads/race-conditions. But it's very difficult to be >> > sure, I >> > > > spent today adding logging and trying to repro, but did not repro >> the >> > bug >> > > > all day. I will keep on this tomorrow trying to find the right >> > conditions >> > > > to force it to occur. If I can figure out what those are, I will >> share >> > > them >> > > > with you. >> > > > >> > > > -Greg >> > > > >> > > > >> > > > On Tue, May 5, 2026 at 9:06 AM Greg Dove <[email protected]> >> wrote: >> > > > >> > > > > Wake up brain (self talk): >> > > > > "and then not wrong for subsequent output" <- should be of course >> > "and >> > > > > then wrong for subsequent output". >> > > > > >> > > > > On Tue, May 5, 2026 at 9:05 AM Greg Dove <[email protected]> >> > wrote: >> > > > > >> > > > >> Thanks for looking into this, Josh. >> > > > >> >> > > > >> "If it isn't too difficult to reproduce" >> > > > >> Quick comments, just in case it helps: >> > > > >> >> > > > >> It was not something I could repro for debugging purposes in the >> > > > >> compiler. It was still 'rare' in practice - max 2-3 times per day >> > > that I >> > > > >> observed, sometimes only once a day - and not manifesting in the >> > same >> > > > code >> > > > >> - although perhaps that is simply because code can change a lot >> > > between >> > > > >> compiler runs - and "awareness" was based on the app not >> starting up >> > > > >> correctly or noticeable runtime errors. I did not check this: >> > perhaps >> > > > it is >> > > > >> happening more often than I think but with no side effects. This >> > could >> > > > >> happen if it sometimes outputs a typed method as >> instance.method() >> > > where >> > > > >> type resolution worked and elsewhere alongside as >> > instance['method']() >> > > > >> where it did not. The problem might not simply get noticed in >> this >> > > case, >> > > > >> but this is pure speculation, I have not checked for this. >> > > > >> >> > > > >> I did not try reducing heap allocation or anything to try to >> create >> > > > >> conditions for it to perhaps happen more often if it is memory/GC >> > > > related. >> > > > >> >> > > > >> I see notes like this in the code: >> > > > >> // If we get this far, then we did not find a cached entry >> > > > >> // It is possible for 2+ threads to get in here for the same >> name. >> > > > >> // This is intentional - the worst that happens is that we >> > duplicate >> > > > the >> > > > >> resolution work >> > > > >> // the benefit is that we avoid any sort of locking, which was >> > proving >> > > > >> expensive (time wise, >> > > > >> // and memory wise). >> > > > >> >> > > > >> When you see the code that was problematic output, you can see >> the >> > > same >> > > > >> name lookup inside a js method that is obviously correctly >> resolved >> > > > >> (anecdotally it seems to be more often 'correct' the first time) >> and >> > > > then >> > > > >> not wrong for subsequent output, in nearby code, so I assume it >> > might >> > > be >> > > > >> related to some unsynchronized state or failure to do that >> > 'duplicate' >> > > > >> resolution work, where the various parts were being processed in >> > > > parallel... >> > > > >> >> > > > >> Anyway, good luck, please let me know if you have anything you >> > think I >> > > > >> could do to help. >> > > > >> >> > > > >> >> > > > >> >> > > > >> On Tue, May 5, 2026 at 6:29 AM Harbs <[email protected]> >> wrote: >> > > > >> >> > > > >>> Sure. I’ll be in touch off list. >> > > > >>> >> > > > >>> > On May 4, 2026, at 9:18 PM, Josh Tynjala < >> > > [email protected]> >> > > > >>> wrote: >> > > > >>> > >> > > > >>> > Would you be willing to give me access to the project? If it >> > isn't >> > > > too >> > > > >>> > difficult to reproduce, I may be able to figure out what's >> going >> > on >> > > > >>> and how >> > > > >>> > to restore the missing typing data, similar to my other fix. >> My >> > > > >>> feeling is >> > > > >>> > that the original Adobe devs intended for occasional garbage >> > > > >>> collection to >> > > > >>> > occur to stay within memory limits, but that the data would be >> > > > >>> restorable, >> > > > >>> > if needed later. I think that they simply missed some places >> > where >> > > it >> > > > >>> might >> > > > >>> > need to be restored because it happens pretty rarely. Or maybe >> > our >> > > > >>> newer JS >> > > > >>> > emitter isn't properly accounting for that possibility. >> > > > >>> > >> > > > >>> > -- >> > > > >>> > Josh Tynjala >> > > > >>> > Bowler Hat LLC >> > > > >>> > https://bowlerhat.dev/ >> > > > >>> > >> > > > >>> > >> > > > >>> > On Mon, May 4, 2026 at 10:37 AM Harbs <[email protected]> >> > > wrote: >> > > > >>> > >> > > > >>> >>> You've tested that this issue still >> > > > >>> >>> reproduces using a compiler built from the latest source >> code? >> > > > >>> >> >> > > > >>> >> This was reproduced by a number of devs all working on the >> same >> > > > >>> project. >> > > > >>> >> And yes, it was with recent builds. >> > > > >>> >> >> > > > >>> >> I don’t think I personally have seen it (I have a lot of >> memory >> > on >> > > > my >> > > > >>> >> machine), but it seems to have gotten worse recently. I don’t >> > know >> > > > if >> > > > >>> >> something changed in the compiler or it’s due to the >> increased >> > > > >>> project size. >> > > > >>> >> >> > > > >>> >> This was with variables — not functions. >> > > > >>> >> >> > > > >>> >> Harbs >> > > > >>> >> >> > > > >>> >>> On May 4, 2026, at 6:54 PM, Josh Tynjala < >> > > > [email protected]> >> > > > >>> >> wrote: >> > > > >>> >>> >> > > > >>> >>> This issue may be the same one: >> > > > >>> >>> >> > > > >>> >>> https://github.com/apache/royale-compiler/issues/182 >> > > > >>> >>> >> > > > >>> >>> I also encountered and fixed an issue related weak >> references a >> > > > >>> little >> > > > >>> >> over >> > > > >>> >>> a year ago. Function bodies were getting garbage collected, >> > and I >> > > > >>> needed >> > > > >>> >> to >> > > > >>> >>> clear out some stale definitions that were causing missing >> > > classes >> > > > in >> > > > >>> >>> generated ASDoc output and some similar issues with the >> -watch >> > > > >>> compiler >> > > > >>> >>> option. >> > > > >>> >>> >> > > > >>> >>> >> > > > >>> >> >> > > > >>> >> > > > >> > > >> > >> https://github.com/apache/royale-compiler/commit/35eed62f13519c659e6346d26cca3f44afe3170f >> > > > >>> >>> >> > > > >>> >>> This fix does not appear to have made it into a release yet. >> > > You're >> > > > >>> not >> > > > >>> >>> using an older compiler build, right? You've tested that >> this >> > > issue >> > > > >>> still >> > > > >>> >>> reproduces using a compiler built from the latest source >> code? >> > > > >>> >>> >> > > > >>> >>> -- >> > > > >>> >>> Josh Tynjala >> > > > >>> >>> Bowler Hat LLC >> > > > >>> >>> https://bowlerhat.dev/ >> > > > >>> >>> >> > > > >>> >>> >> > > > >>> >>> On Sun, May 3, 2026 at 9:40 PM Greg Dove < >> [email protected]> >> > > > >>> wrote: >> > > > >>> >>> >> > > > >>> >>>> Compiler issues - (Josh, please?) >> > > > >>> >>>> >> > > > >>> >>>> We have a medium-sized project that has begun encountering >> > > > >>> >> occasional/rare >> > > > >>> >>>> (but at least daily during normal workloads) compilation >> > issues >> > > > that >> > > > >>> >> appear >> > > > >>> >>>> to be related to name/type resolution. There can be code >> > within >> > > a >> > > > >>> method >> > > > >>> >>>> output where the name resolves correctly to its type in one >> > part >> > > > of >> > > > >>> the >> > > > >>> >>>> method's js output and elsewhere within the same js method >> > > output >> > > > >>> as if >> > > > >>> >> it >> > > > >>> >>>> was Object/untyped. This is most obvious with XML or >> XMLList >> > > > >>> instances >> > > > >>> >>>> (because of .child('prop') vs ['prop] differences). I've >> also >> > > seen >> > > > >>> it >> > > > >>> >> get >> > > > >>> >>>> confused between local variables and instance properties in >> > some >> > > > >>> cases, >> > > > >>> >>>> which I believe is a manifestation of the same thing. In >> other >> > > > >>> words, >> > > > >>> >>>> different compilation runs with the exact same settings are >> > not >> > > > >>> >>>> completely deterministic, because sometimes they can >> provide >> > > > >>> different >> > > > >>> >>>> output. It is very difficult to repro, because it feels so >> > > random. >> > > > >>> But >> > > > >>> >> it >> > > > >>> >>>> has been something that appears to be more frequent as the >> > > > codebase >> > > > >>> >> grows >> > > > >>> >>>> (when all other settings remain the same). This led me to >> > > consider >> > > > >>> that >> > > > >>> >> it >> > > > >>> >>>> could be GC-related, and I recently removed the >> SoftReferences >> > > > >>> inside >> > > > >>> >>>> ASScopeCache, as a prime suspect. >> > > > >>> >>>> >> > > > >>> >>>> After doing this, I have not seen the problem since (so >> far - >> > > > after >> > > > >>> 1.5 >> > > > >>> >>>> days) >> > > > >>> >>>> >> > > > >>> >>>> The ASScopeCache instances themselves are weakly held >> (inside >> > > > >>> >>>> CompilerProject). So the internal maps inside each of these >> > > > >>> instances >> > > > >>> >> being >> > > > >>> >>>> weakly held as well seems to be the problem, the internal >> maps >> > > can >> > > > >>> >> perhaps >> > > > >>> >>>> get into a partially cleared state between threads. >> > > > >>> >>>> >> > > > >>> >>>> I did some memory profiling with and without this change >> for >> > > > >>> removing >> > > > >>> >> the >> > > > >>> >>>> SoftReferences inside ASScopeCache - but it was quite >> limited >> > > > (just >> > > > >>> >> testing >> > > > >>> >>>> with compiling the one project). The memory usage was not >> much >> > > > >>> >> different on >> > > > >>> >>>> a typical run (approx 1Mb difference for a compilation with >> > > around >> > > > >>> 1000 >> > > > >>> >> .as >> > > > >>> >>>> and .mxml files combined, alongside a bunch of local swcs). >> > > There >> > > > >>> was >> > > > >>> >>>> possibly a small speed up without the SoftReferences, but I >> > did >> > > > not >> > > > >>> test >> > > > >>> >>>> enough to be sure. >> > > > >>> >>>> But so far it seems there is not a big impact on memory >> with >> > > > >>> omitting >> > > > >>> >>>> these. If it introduces consistency I'm kinda keen to get >> it >> > in >> > > > >>> there - >> > > > >>> >> I >> > > > >>> >>>> know others have definitely seen this problem too. >> > > > >>> >>>> And for Josh in particular: I think your compiler >> experience >> > > > dwarfs >> > > > >>> the >> > > > >>> >>>> rest of us and wanted to get your feedback instead of just >> > > jumping >> > > > >>> in >> > > > >>> >> with >> > > > >>> >>>> this one. One option could also be to make this change as a >> > > > compiler >> > > > >>> >>>> option, with the new non-weak references being the default, >> > but >> > > > >>> with the >> > > > >>> >>>> ability to switch to the older behaviour via the option if >> > that >> > > > was >> > > > >>> >>>> considered important as well... look forward to hearing >> your >> > > > >>> thoughts. >> > > > >>> >>>> >> > > > >>> >> >> > > > >>> >> >> > > > >>> >> > > > >>> >> > > > >> > > >> > >> >
