stevedlawrence opened a new pull request, #865:
URL: https://github.com/apache/daffodil/pull/865

   > Note: I got really tired of the really long CI times, so decided to 
attempt to try speed up our integration tests (which are the biggest offenders) 
and tackle DAFFODIL-2381 at the same time. As a proof on concept, I've only 
updated the two largest integration test suites: `TestCLIparsing` and 
`TestCLIdebugger`. With these changes, these suites went from 250 seconds and 
165 seconds respectively, to 17 seconds and 6 seconds. So from almost 7 minutes 
to less than 30 seconds. I think this would be a big win if we updated the rest 
of the integration tests, but I wanted to get some thoughts on this new API and 
if the are any thoughts or alternative approaches before doing the rest of 
them, hence why this is just an RFC.
   
   The current integration tests have too much boilerplate that can be 
refactored with some new APIs. Additionally, the integration tests are 
incredibly slow due to all the forking and creation of new VMs. This fixes both 
of these issues.
   
   The new integration test API creates a new `runCLI` function. The first 
parameter is an array of CLI string arguments (with a new "args" string 
interpolator to make specifying this array easier). The second argument is a 
function that contains the normal `expect` logic all our of integration tests 
already use. The third argument is the expected exit code.
   
   A new `path` function is created to help work with paths in a OS agnostic 
manner.
   
   A new `withTempFile` function is added to help with creating temporary files 
in a special temporary daffodil directory, and to delete the files when the 
test is complete.
   
   With these new functions, integration tests now look something like this:
   
   ```scala
       val schema = path("path/to/schema.dfdl.xsd")
       runCLI(args"parse -s $schema") { cli =>
         cli.sendInput("data to parse", inputDone = true)
         cli.expect("<data>string to expect</data")
       } (ExitCode.Success)
   ```
   
   The runCLI function creates new streams for capturing stdin/out/err. A new 
thread is run that calls the org.apache.daffodil.Main run function, and 
configures Main and log4j to output to these streams. A new expect instance is 
also configured so that it can read/write to these streams and validate the 
output, with a new CLITest class used to provide helper wrappers to make expect 
logic a bit less verbose.
   
   Multiple changes were also made to Main and the Debugger to ensure they 
always use the provided streams. This is really only needed for this 
integration testing, but could potentially be useful in other use cases where 
stdin/out/err need need to be mimicked without forking a new process.
   
   Note that this approach isn't thread safe, but integration tests are already 
not run in parallel. By using threads instead of spawning new processes, we 
drastically decrease the overhead, with test suites now taking a number of 
seconds instead of a number of minutes, averaging an order of magnitude speed 
up.
   
   This also ensures coverage is enabled in the github workflow every time we 
execute sbt. Otherwise, changes of coverage enablement causes a recompile. By 
doing this we avoid recompiling the same files multiple times which saves a 
couple of minutes overall. Note that we originally disabled coverage on some 
steps because it caused failures, but whatever caused that seems to have been 
fixed.
   
   DAFFODIL-2381


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to