mbeckerle commented on PR #1286: URL: https://github.com/apache/daffodil/pull/1286#issuecomment-2329763424
Maybe see if stack-overflow folks can clarify the "I 'm not 100% sure how to interpret this." , problem? On Fri, Aug 30, 2024 at 9:59 AM Steve Lawrence ***@***.***> wrote: > After a number of runs, I was able to reproduce the issue of different > saved parsers in GitHub twice (I'm still unable to do it locally) and > download the differing saved parsers. I was able to dump the serialized > processors to text (using NickstaDB/SerializationDumper > <https://github.com/NickstaDB/SerializationDumper>) and do a diff, and > found the only difference in both cases was this: > > delimiters > (array)- TC_ARRAY - 0x75- TC_REFERENCE - 0x71- Handle - 8258633 - 0x00 7e 04 49- newHandle 0x00 7e 04 c6- Array size - 0 - 0x00 00 00 00- Values+ TC_REFERENCE - 0x71+ Handle - 8258634 - 0x00 7e 04 4a > > The delimiters member is the Array[DelimiterParseEv] in the > DelimiterStackParser. I'm not 100% sure how to interpret this, but I > think in the first case the delimier array is a uniquely allocated > zero-length array, where the reference is to the type (i.e. the class > description for DelimiterParseEv). In the second case, it just points to > an already existing array zero-length array. So in one case the zero-length > array is shared, in the other it's not. So functionally exactly the same, > just one has extra allocations. > > I wonder if maybe there is some optimization somewhere something detects > zero-length arrays and can share them instead of allocating new arrays? And > sometimes that happens and sometimes it doesn't, which leads to differences > in serialization? > > I imagine we could maybe do something like manually detect when an array > is zero length and make sure we use Array.empty() or some globally > allocated zero-length array, but I imagine we're unlikely to ever find all > cases so reproducibility will still be an issue. > > On a positive note, I have confirmed the difference has nothing to do with > paths, so that issue preventing reproducibility is definitely fixed. But I > wonder if we should disable this test to avoid random CI failures? Maybe in > the future we can switch to some serialization format that is more reliably > consistent? > > Any thoughts? > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/daffodil/pull/1286#issuecomment-2321350663>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AALUDAZOJDYIOEBBSYX4H63ZUB3CZAVCNFSM6AAAAABNIVBP2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRGM2TANRWGM> > . > You are receiving this because you commented.Message ID: > ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
