potiuk commented on issue #5556:
URL: https://github.com/apache/skywalking/issues/5556#issuecomment-699077309
> "first of the runs matching..." means the latest run? We expect it to skip
cancelling the most recent run, i.e. only keep the most recent run and cancel
all previous runs, ignoring whether the runs are before/after the current
cancelling run.
Yep. The run ids are monotonically increasing in GA and I sort them in desc
order and skipping the first one.
> It's right for sure, we've been balancing the batch size and the running
time for so long to reduce the waiting time as much as possible, for
SkyWalking, we have 700 ~ 800+ test cases for every pull request, and now we
batch them into ~100 matrices, which overall takes about 1h and completely run
out of the slots, adding dependency will line up much more longer and take much
more time
Yeah. that you simply have a lot of jobs and I think the problem you might
have is that you have simply a bit too many things to run :). I think if you
saturate number of jobs, delaying some of them might actually make it run
equally fast - because currently your jobs are queuing anyway. But by delaying
startup of some of those jobs you might simply get the chance of cancelling
those duplicated ones - this might be an overall improvement. If I may advise
you - it would be better to combine all the different workflows you have in
single workflow with dependencies - then you can first run Plugin tests, then
E2e tests etc. This is also very helpful if you want to manually cancel the
workflow - you can cancel all the different jobs by cancelling the workflow and
it simplifies the use of my "cancel-workflow-run" - because effectively you
have just one workflow to cancel.
I looked at your jobs briefly and I think I might have a proposal how to
speed your builds up a lot (this is exactly what I v'e done in Airflow). I see
that in every job of yours, you are building a Sky Walker Agent and then Docker
Image with this agent built in. In other jobs you are compiling and building
your packages in every job.
My question is - is that agent and Docker image the same for all jobs (or
big subsets of those? It looks like, after I looked at your workflows. If yes,
then you might take the same approach I took in Airflow.
If yes, then you might take a look what I've done in Airflow:
https://github.com/apache/airflow/blob/master/CI.rst . You might look at the
end to see the sequence diagrams and read some details, but it boils down to
building the images (and your agent) only once per PR and reusing that built
image by all the jobs. That saved us 3,10 or 20 (depends on state of the
image) minutes per job and looking at the number of jobs you have and the
length of the tests - if you can extract this step to "workflow_run", this
might decrease the time needed for your builds a lot.
The "workflow_run" workflow has "write" access to your repo and it allows
you to push built images to GitHub Container registry - and then all the test
jobs simply pull that built image.
However it's a bit complex to get it right and secure. For security reasons
(write access) the "workflow_run" always runs with the "master' version of repo
for workflows - so what I do in Airflow - I checkout the version of code that
comes with the pr separately and use the "master' scripts to build them. Then
in the "PR" workflow I periodically pull the images to see if they are already
pushed. This is one job that runs doing "nothing" (checking if image is there)
for the time of building the images, but then all the others run much faster
after that (in the worst case instead of 20m to build the image, it is 1m to
pull it).
I think in your case you can get HUGE improvements out of that. From looking
at your jobs, 70% of each job (5m for image build and > 10m for package
building) is building the images/packages and almost all of it can be reduced
to < 30 s. I think you can gain a lot by it.
> > I am also not sure how your workflow looks like, but in our case there
are several preparatory steps before we launch "bigger number of jobs tests".
And they run for long enough time (and free slots) so that when the next commit
from the same branch is pushed, the "old" tests have not started yet and they
are not yet blocking the slots.
>
> We don't actually have such kind of preparatory steps in SkyWalking, the
heavy part is plugin tests and they don't have much preparation work, not to
say running a long time
I think the "build image/packages" if you separate them out to the
"workflow-run" . If you separate the build steps out, it will always take at
least 5-10 minutes before the image/package is ready and if anyone pushes a new
commit in that time, there are no matrix jobs started yet, and the slots are
not blocked.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]