falkzoll commented on PR #5442: URL: https://github.com/apache/openwhisk/pull/5442#issuecomment-1778874198
Hi @joni-jones , maybe you or someone else here in this PR can help us to understand an issue in the https://github.com/apache/openwhisk-runtime-nodejs runtime... actually the scheduled openwhisk nodejs runtime builds fail (https://github.com/apache/openwhisk-runtime-nodejs/actions). They fail with a timeout (after 30s) in a certain testcase that tests the concurrent invocation capability (number of concurrent invokes/runs for a single action container) of this runtime (for nodejs:18 and nodejs:20). ``` runtime.actionContainers.NodeJs18ConcurrentTests > action-nodejs-v18 should allow running activations concurrently FAILED java.util.concurrent.TimeoutException at org.apache.openwhisk.core.containerpool.AkkaContainerClient$.$anonfun$executeRequest$1(AkkaContainerClient.scala:252) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) ``` Same for nodejs:20. This kind of concurrency is actually only supported in the nodejs runtimes. Other runtimes just support '__OW_ALLOW_CONCURRENT=false' and can only handle one invoke/run per action container at a time. To run the tests, the github action of the nodejs runtime clones the latest available apache/openwhisk repository and uses its master as the base to run its tests (https://github.com/apache/openwhisk-runtime-nodejs/blob/master/.github/workflows/ci.yaml). The actually failing testcase performs 128 (https://github.com/apache/openwhisk-runtime-nodejs/blob/c60a6676375d85878c658412162004848c19f965/tests/src/test/scala/runtime/actionContainers/NodeJsConcurrentTests.scala#L32) parallel action invokes/run requests into a single nodejs runtime action container. The scala test utilizes the AkkaContainerClient to open these 128 parallel connections (https://github.com/apache/openwhisk-runtime-nodejs/blob/c60a6676375d85878c658412162004848c19f965/tests/src/test/scala/runtime/actionContainers/NodeJsConcurrentTests.scala#L24). The nodejs test action invoked inside the action container (https://github.com/apache/openwhisk-runtime-nodejs/blob/c60a6676375d85878c658412162004848c19f965/tests/src/test/scala/runtime/actionContainers/NodeJsConcurrentTests.scala#L41) is coded to wait for 128 incoming invokes before it starts to complete all of them with a response (makes use of global variables). Means the first action run request to this action is held open and not answered before the 128th run reached the action code in this container. With this the test can be sure that this runtime can handle this number of concurrent action invokes being open at the same time. This test usually takes far less than 5 seconds in the github action while the timeout for this test is 30s. Debugging it locally showed that with the current implementation in this PR the test seems not to reach the required 128 parallel connections anymore. It seems after a set of open connections is reached (far less than 128), no others are done anymore or maybe they arrive very, very slow. With this, the pending invokes in the action container do not return a response in time and the testcase fails after 30s. Reverting to the previous commit (0c27a650ab6073e131e5c74002465e93cf4d8621) resolves the issue and the concurrency test completes successful within the usual few seconds for nodejs:18 and nodejs:20. Looking at the changes in this PR it does not look like a change is needed to adapt the nodejs runtime tests. Anyhow, with your broader akka background, do we need to modify something in the nodejs runtime tests to consume this latest apache/openwhisk core? Any hints are welcome :-). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
