potiuk commented on issue #5556:
URL: https://github.com/apache/skywalking/issues/5556#issuecomment-699077309


   > "first of the runs matching..." means the latest run? We expect it to skip 
cancelling the most recent run, i.e. only keep the most recent run and cancel 
all previous runs, ignoring whether the runs are before/after the current 
cancelling run.
   
   Yep. The run ids are monotonically increasing in GA and I sort them in desc 
order and skipping the first one. 
     
   > It's right for sure, we've been balancing the batch size and the running 
time for so long to reduce the waiting time as much as possible, for 
SkyWalking, we have 700 ~ 800+ test cases for every pull request, and now we 
batch them into ~100 matrices, which overall takes about 1h and completely run 
out of the slots, adding dependency will line up much more longer and take much 
more time
   
   Yeah. that you simply have a lot of jobs and I think the problem you might 
have is that you have simply a bit too many things to run  :). I think if you 
saturate number of jobs, delaying some of them might actually make it run 
equally fast - because currently your jobs are queuing anyway. But by delaying 
startup of some of those jobs you might simply get the chance of cancelling 
those duplicated ones - this might be an overall improvement.  If I may advise 
you - it would be better to combine all the different workflows you have in 
single workflow with dependencies - then you can first run Plugin tests, then 
E2e tests etc. This is also very helpful if you want to manually cancel the 
workflow - you can cancel all the different jobs by cancelling the workflow and 
it simplifies the use of my "cancel-workflow-run" - because effectively you 
have just one workflow to cancel.
   
   I looked at your jobs briefly and I think I might have a proposal how to 
speed your builds up a lot (this is exactly what I v'e done in Airflow). I see 
that in every job of yours, you are building a Sky Walker Agent and then Docker 
Image with this agent built in. In other jobs you are compiling and building 
your packages in every job. 
   
   My question is - is that agent and Docker image the same for all jobs (or 
big subsets of those? It looks like, after I looked at your workflows. If yes, 
then you might take the same approach I took in Airflow.
   
   If yes, then  you might take a look what I've done in Airflow: 
https://github.com/apache/airflow/blob/master/CI.rst . You might look at the 
end to see the sequence diagrams and read some details, but it boils down to 
building the images (and your agent) only once per PR and reusing that built 
image by all the jobs. That saved us 3,10  or 20 (depends on state of the 
image) minutes per job and looking at the number of jobs you have and the 
length of the tests - if you can extract this step to "workflow_run", this 
might decrease the time needed for your builds a lot. 
   
   The "workflow_run" workflow has "write" access to your repo and it allows 
you to push built images to GitHub Container registry - and then all the test 
jobs simply pull that built image. 
   
   However it's a bit complex to get it right and secure. For security reasons 
(write access) the "workflow_run" always runs with the "master' version of repo 
for workflows - so what I do in Airflow - I checkout the version of code that 
comes with the pr separately and use the "master' scripts to build them. Then 
in the "PR" workflow I periodically pull the images to see if they are already 
pushed. This is one job that runs doing "nothing" (checking if image is there) 
for the time of building the images, but then all the others run much faster 
after that (in the worst case instead of 20m to build the image, it is 1m to 
pull it).
   
   I think in your case you can get HUGE improvements out of that. From looking 
at your jobs, 70% of each job (5m for image build and > 10m for package 
building)  is building the images/packages and almost all of it can be reduced 
to < 30 s.  I think you can gain a lot by it.
   
   > > I am also not sure how your workflow looks like, but in our case there 
are several preparatory steps before we launch "bigger number of jobs tests". 
And they run for long enough time (and free slots) so that when the next commit 
from the same branch is pushed, the "old" tests have not started yet and they 
are not yet blocking the slots.
   > 
   > We don't actually have such kind of preparatory steps in SkyWalking, the 
heavy part is plugin tests and they don't have much preparation work, not to 
say running a long time
   
   I think the  "build image/packages" if you separate them out to the 
"workflow-run" . If you separate the build steps out, it will always take at 
least 5-10 minutes before the image/package is ready and if anyone pushes a new 
commit in that time, there are no matrix jobs started yet, and the slots are 
not blocked.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to