areusch commented on pull request #9129:
URL: https://github.com/apache/tvm/pull/9129#issuecomment-932996398


   so originally the request was essentially around the integration tests, 
which we run in smaller sets (e.g. relay, topi, etc). when a test in the early 
set fails, results from the later ones aren't reported. this change isn't quite 
the same--but, it's the same argument as to why you may not want fail-fast; for 
example, if a test fails in the `ci_arm` container, you may not know whether 
it's also failing in ci_gpu or vice versa. i agree CI is not a personal testing 
environment, but it is sometimes the easiest way for developers to access cloud 
platforms they don't have e.g. arm, gpu.
   
   @Mousius the comment you referenced is a bit more general and i'm not sure 
this specific issue contributes to CI taking a while to complete. you can 
monitor CI if you're anxious for the test results. one effort in progress is 
the `xdist` which should have a bit bigger impact without potentially making it 
harder to access a test platform you don't have locally. i'm not opposed to 
changing CI to improve developer productivity, but could you motivate this 
specific change a bit more? in practice this seems most likely to result in 
cancellation of GPU integration tests, but the [number of available GPU 
executors](https://ci.tlcpack.ai/label/GPU/load-statistics?type=hour) has not 
been 0 in the past month. perhaps we should track that stat for a bit now that 
#9128 is in. i am wondering if maybe it already somewhat addressed this 
concern. 
   
   @jroesch your comment is a bit generic. i still would like to see more 
rationale as to why cancelling the GPU unit tests when an ARM one fails. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to