[jira] [Commented] (DAFFODIL-2627) Performance regression in TDML processor

Steve Lawrence (Jira) Fri, 28 Jan 2022 07:19:08 -0800


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483816#comment-17483816
 ]


Steve Lawrence commented on DAFFODIL-2627:
------------------------------------------

Yeah, sounds like the garbage collector is just too aggressive, or maybe 
IntelliJ triggers the garbage collector between test suites? How much memory 
are you giving the JVM? It's possible that your low on memory so its triggering 
the garbage collector more?

Regardless, relying on the garbage collector is probably not a great idea in 
hindsight, we just don't have enough control over the cache. We probably do 
need to replace the WeakHashMap cache with an actual cache with a timer and 
more control over when cache items are evicted. Though, I wonder if we also 
need to detect changes in schema to trigger a rebuild even if something is in 
the cache? We didn't have the probably when the cache was in a Runner because 
runners were recreated everytime a test ran. Maybe we a small enough expiration 
time this isn't an issue?

That said, I feel like the problem here is really JUnit (and probably most 
other unit test tools)--they just aren't designed to support sharing objects 
between test suites, so we have to hack together this global cache. And our 
Runner implementation is also maybe part of the problem since each Runner only 
supports a single tdml file, which makes it difficult to share schemas in 
different TDML files.

It almost feels like the ideal approach would be to scrap JUnit altogether and 
have a custom TDML test interface. This could then scan all TDML files and 
figure out which schemas need to be compiled. It could detect which schemas are 
shared among different TDML files and run them together. And it could throw 
away schemas at exactly the right time because it would know when all tests are 
done with a schema. This interface would have all the necessary information to 
run tests efficiently because it actually knows what a TDML file is.

Unfortunately, that's a pretty sizable effort, especially since we'd have to 
create separate plugins for supported IDEs. A cache is certainly the best short 
term solution, and is probably sufficient long term, even though it feels a bit 
hacky to me.

> Performance regression in TDML processor
> ----------------------------------------
>
>                 Key: DAFFODIL-2627
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2627
>             Project: Daffodil
>          Issue Type: Bug
>          Components: TDML Runner
>    Affects Versions: 3.2.0
>            Reporter: Josh Adams
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 3.3.0
>
>
> While working on a customer project we noticed a significant increase in the 
> amount of time it took to run our test suit (over 600 tests) after upgrading 
> from Daffodil 2.7.0 to 3.2.1.  We were seeing roughly a 4x increase in time 
> to complete the same set of tests.
> I've narrowed the performance regression to commit 
> 0700ee8dc9531497f3e8b0fdf9266f8e3b105c27 which involved a removal of the 
> schema compilation cache, which is likely causing the schema to need to be 
> recompile much more frequently.
> We use a relatively large schema (over 10,000 lines), but it is the same 
> schema used for all tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (DAFFODIL-2627) Performance regression in TDML processor

Reply via email to