Re: Updating Instaclustr donated Jenkins Agents

Fleming, Jackson via dev Fri, 24 May 2024 13:30:32 -0700

Thanks Brandon and Mick,

This is exactly the feedback I was looking for, the last thing we want to do is 
reduce the throughput of the already strained CI pipelines.

Sounds like it's a bigger task than just cutting over to ARM, just want to 
reassure you Brandon we certainly won't change anything without discussion on 
this thread first, especially if we're going to be reducing the number of boxes 
available by ~21% for no immediate value.

I'll be in touch next week, enjoy your weekends

Thanks,
Jackson

________________________________
From: Mick Semb Wever <m...@apache.org>
Sent: Saturday, May 25, 2024 2:37:14 AM
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: Updating Instaclustr donated Jenkins Agents

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments

Jackson,
  we are very thankful for all the donations from Instaclustr.  Getting people 
(and resources) involved in ARM maintenance and testing is desperately needed.  
More detailed feedback below.

On Fri, 24 May 2024 at 16:08, Brandon Williams 
<dri...@gmail.com<mailto:dri...@gmail.com>> wrote:
On Thu, May 23, 2024 at 5:51 PM Fleming, Jackson via dev
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>> wrote:
> Primarily this would be moving from x86 instances to Graviton ARM based ones, 
> as we’ve seen a pretty good uptake of ARM usage, and we’d like to help ensure 
> that there’s good testing coverage across both x86 and ARM architectures.

I just want to note that this will reduce the x86 pool from 42 to 31,
and then we will have a parallel pipeline of 9 ARM agents (15 if the
other 6 come back.)  Currently I think we have about an 8 hour
post-commit run time with 42 machines (though I'm sure there is room
for improvement.)

Today only artifact/packaging jobs are routinely run on ARM servers, due to 
their limited number.  They are currently disabled waiting on INFRA-25819.

There are test jobs for arm, but they are not run routinely, as they are not 
part of any branch's pipeline.   Last run of any arm job was 1 yr 8 mo ago.  
This was mostly to discover what the arm test failures were, on a one-off basis 
 (iirc there's a small handful, like supporting the older snappy compression 
option).

To include testing on arm in pre- and post- commit, we would need to
1. fix all failures, and
2. have a lot more arm agents.

We currently have 42 x86 agents.  If we took away 9 we'd see throughput reduce 
to ~75% (turn-around times become 1.3x longer).

And then, if we included arm testing in the pipeline the bottleneck would be 
the new 15 arm agents, meaning the overall throughput reduces to ~35%  
(turn-around times become 2.8x longer).

Our biggest hurdle to begin with is really people's time, not hardware.  When 
we get to the hardware problem, 15 agents will be quite limiting (and likely 
deemed not enough).

Note, the standalone jenkinsfile in 5.0+ was designed to make running arm CI 
jobs (also on your own k8s) much easier.

Re: Updating Instaclustr donated Jenkins Agents

Reply via email to