Re: [CentOS] Design changes are done in Fedora

Warren Young Mon, 29 Dec 2014 18:06:07 -0800

On Dec 29, 2014, at 4:03 PM, Les Mikesell <lesmikes...@gmail.com> wrote:

> On Mon, Dec 29, 2014 at 3:03 PM, Warren Young <w...@etr-usa.com> wrote:
>> 
>> the world where you design, build, and deploy The System is disappearing 
>> fast.
> 
> Sure, if you don't care if you lose data, you can skip those steps.

How did you jump from incremental feature roll-outs to data loss?  There is no 
necessary connection there.

In fact, I’d say you have a bigger risk of data loss when moving between two 
systems released years apart than two systems released a month apart.  That’s a 
huge software market in its own right: legacy data conversion.

If your software is DBMS-backed and a new feature changes the schema, you can 
use one of the many available systems for managing schema versions.  Or, roll 
your own; it isn’t hard.

You test before rolling something to production, and you run backups so that if 
all else fails, you can roll back to the prior version.

None of this is revolutionary.  It’s just what you do, every day.

> when it breaks it's not the developer answering
> the phones if anyone answers at all.

Tech support calls shouldn’t go straight to the developers under any 
development model, short of sole proprietorship, and not even then, if you can 
get away with it.  There needs to be at least one layer of buffering in there: 
train up the secretary to some basic level of cluefulness, do everything via 
email, or even hire some dedicated support staff.

It simply costs too much to break a developer out of flow to allow a customer 
to ring a bell on a developer’s desk at will.

>> The world is moving toward incrementalism, where the first version of The 
>> System is the smallest thing that can possibly do anyone any good.  That is 
>> deployed ASAP, and is then built up incrementally over years.
> 
> That works if it was designed for rolling updates.  Most stuff isn’t,

Since we’re contrasting with waterfall development processes that may last many 
years, but not decades, I’d say the error has already been made if you’re still 
working with a waterfall-based methodology today.

The first strong cases for agile development processes were first made about 15 
years ago, so anything started 7 years ago (to use the OP’s example) was 
already disregarding a shift a full software generation old.

> some stuff can't be.

Very little software must be developed in waterfall fashion.

Avionics systems and nuclear power plant control systems, for example.  Such 
systems make up a tiny fraction of all software produced.

A lot of commercial direct-to-consumer software also cannot be delivered 
incrementally, but only because the alternative messes with the upgrade 
treadmill business model.

Last time I checked, this sort of software only accounted for about ~5% of all 
software produced, and that fraction is likely dropping, with the moves toward 
cloud services, open source software, subscription software, and subsidized 
software.

The vast majority of software developed is in-house stuff, where the developers 
and the users *can* enter into an agile delivery cycle.

>> Instead of trying to go from 0 to 100 over the course of ~7 years, you 
>> deliver new functionality to production every 1-4 weeks, achieving 100% of 
>> the desired feature set over the course of years.
> 
> If you are, say, adding up dollars, how many times do you want that
> functionality to change?

I’m not sure what you’re asking.

If you’re talking about a custom accounting system, the GAAP rules change 
several times a year in the US:

   http://www.fasb.org/jsp/FASB/Page/SectionPage&cid=1176156316498

The last formal standard put out by FASB was 2009, and they’re working on 
another version all the time.  Chances are good that if you start a new 7-year 
project, a new standard will be out before you finish.

If instead you’re talking about the cumulative cost of incremental change, it 
shouldn’t be much different than the cost of a single big-bang change covering 
the same period.

In fact, I’d bet the incremental changes are easier to adopt, since each change 
can be learned piecemeal.  A lot of what people are crying about with EL7 comes 
down to the fact that Red Hat is basically doing waterfall development: many 
years of cumulative change gets dumped on our HDDs in one big lump.

Compare a rolling release model like that of Cygwin or Ubuntu (not LTS).  
Something might break every few months, which sounds bad until you consider 
that the alternative is for *everything* to break at the same time, every 3-7 
years.

I’m not arguing for CentOS/RHEL to turn into Ubuntu Desktop.  I’m just saying 
that there is a cost for stability: every 3-7 years, you must hack your way 
through a big-bang change bolus.

(6-7 years being for those organizations that skip every other major release by 
taking advantage of the way the EL versions overlap.  EL5 was still sunsetting 
as EL7 was rising.)

>> This isn’t pie-in-the-sky theoretical BS.  This is the way I’ve been 
>> developing software for decades, as have a great many others.  Waterfall is 
>> dead, hallelujah!
> 
> How many people do you have answering the phone about the wild and
> crazy changes you are introducing weekly?

The burden of tech support has more to do with proper QA and roll-out 
strategies than with the frequency of updates.

For the most part, we roll new code to a site in response to a support call, 
rather than field calls in response to an update.  The new version solves their 
problem, and we don’t hear back from them for months or years.

We don’t update all sites to every new release.  We merely ship *a* new release 
every 1-4 weeks, which goes out to whoever needs the new features and fixes.  
It’s also what goes out on each new server we ship.

> How much does it cost to train them?

Most of our sites get only one training session, shortly after the new system 
is first set up.

We rarely get asked to do any follow-up training.  The users typically pick up 
on the incremental feature updates as they happen, without any additional help 
from us.  We attribute that to solid UX design.

That first session is mostly about giving the new users an idea of what the 
system can do.  We teach them enough to teach themselves.

How often do most people get trained to use a word processor?  I’ll bet a lot 
of people got trained just once, in grade school.  They just cope with changes 
as they come.

The worst changes are when you skip many versions.  Word 97 to Word 2007, for 
example. *shudder*

>> I don’t mean that glibly.  I mean you have made a fundamental mistake if 
>> your system breaks badly enough due to an OS change that you can’t fix it 
>> within an iteration or two of your normal development process.  The most 
>> likely mistake is staffing your team entirely with people who have never 
>> been through a platform shift before.
> 
> Please quantify that.  How much should a business expect to spend per
> person to re-train their operations staff to keep their systems
> working across a required OS update?  Not to add functionality.  To
> keep something that was working running the way it was?

If you hire competent people, you pay zero extra to do this, because this is 
the job they have been hired to do.

That's pretty much what IT/custom development is: coping with churn.

Most everything you do on a daily basis is a reaction to some change external 
to the IT/development organization:

- Capacity increases

- Obsolete ‘ware upgrades

- New seat/site deployments

- Failed equipment replacements

- Compatibility breakage repair (superseded de facto standard, old de jure 
standard replaced, old proprietary item no longer available…)

- Tracking business rule change (GAAP, regulations, mergers…)

- Effecting business change (entering new markets, automation, solving new 
problems developing from new situations…)

- Tracking business strategy change (new CEO, market shift…)

Setting aside retail software development, IT and internal development 
organizations *should* be chasing this kind of thing, not being “proactive.”  
We’re not trying to surprise our users with things they didn’t even ask for, 
we’re trying to solve their problems.

Maybe we solve problems in a *manner* our users did not expect — hopefully a 
better way — but we’re rarely trying to innovate, as such.

> how much developer time would you expect to spend to
> follow the changes and perhaps eventually make something work better?

Pretty much 100%, after subtracting overhead.  (Meetings, email, breaks, 
reading…)

Again: This is what we do.  Some new thing happens in the world, and we go out 
and solve the resulting problems.

The only question is one of velocity: the more staff you add, the faster you 
go.  So, how fast do you want to go?

(Yes, I’ve read “The Mythical Man Month.”  The truths within that fine book 
don’t change the fact that Microsoft can develop a new OS faster than I can all 
by my lonesome.)

>> The software system I’ve been working on for the past 2 decades has been 
>> through several of these platform changes.
> 
> How many customers for your service did you keep running non-stop
> across those transitions?

Most of our customers are K-12 schools, so we’re not talking about a 24/7 
system to begin with.  K-12 runs maybe 9 hours a day (7am - 4pm), 5 days a 
week, 9 months out of the year.  That gives us many upgrade windows.

We rarely change out hardware or the OS at a particular site.  We generally run 
it until it falls over, dead.

This means we’re still building binaries for EL3.

This also means our software must *remain* broadly portable.  When we talk 
about porting to EL7, we don’t mean that it stops working on EL6 and earlier.  
We might have some graceful feature degradation where the older OS simply can’t 
do something the newer one can, but we don’t just chop off an old OS because a 
new one came out.

All that having been said, we do occasionally roll a change to a site, live. We 
can usually do it in such a way that the site users never even notice the 
change, except for the changed behavior.

This is not remarkable.  It’s one of the benefits you get from modern 
centralized software development and deployment stacks.

>> Everyone’s moaning about systemd...at least it’s looking to be a real de 
>> facto standard going forward.
> 
> What you expect to pay to re-train operations staff -just- for this
> change, -just- to keep things working the same..

You ask that as if you think you have a no-cost option in the question of how 
to address the churn.

Your only choices are:

1. Don’t upgrade

2. Upgrade and cope

3. Switch to something else

Each path carries a cost.

You think path 1 is free?  If you skip EL7, you’re just batching up the 
changes.  You’ll pay eventually, when you finally adopt a new platform.  One 
change set plus one change set equals about 1.9 change sets, plus compound 
penalties.

Penalties?  Yes.

You know the old joke about how you eat an elephant? [*]  By the time you eat 
1.9 elephants, you’ve probably built up another ~0.3 change sets worth of new 
problems.  Time you spend grinding through nearly two full change sets is time 
you don’t spend keeping your current backlog short.

We call this technical debt in the software development world.  It’s fine to 
take out a bit of technical debt occasionally, as long as you don’t let it 
build up too long.  The longer you let it build, the more the interest & 
penalties accrue, so the harder it is to pay down.

> We've got lots of stuff that will drop into Windows server versions
> spanning well over a 10 year range.

Yes, well, Linux has always had a problem with ABI stability.  Apparently the 
industry doesn’t really care about this, evidenced by the fizzling of LSB, and 
the current attacks on the work at freedesktop.org.  Apparently we’d all rather 
be fractious than learn to get along well enough that we can nail down some 
real standards.

Once again, though, there’s a fine distinction between stable and moribund.

> And operators that don't have a
> lot of special training on the differences between them.

I’ve never done much with Windows Server, but my sense is that they have plenty 
of churn over in their world, too.  We’ve got SELinux and SystemD, they’ve got 
UAC, SxS DLLs, API deprecation, and tools that shuffle positions on every 
release.  (Where did they move the IPv4 configuration dialog this time?!)

We get worked up here about things like the loss of 32-bit support, but over in 
MS land, they get API-of-the-year.  JET, ODBC, OLE DB, or ADO?  Win32, .NET 
desktop, Silverlight, or Metro?  GDI, WinG, DirectX, Windows Forms or XAML?  On 
and on, and that’s just if you stay within the MSDN walls.

>> Could it be that software for these other platforms *also* manages to ride 
>> through major breaking changes?
> 
> Were you paying attention when Microsoft wanted to make XP obsolete?
> There is a lot of it still running.

Were you paying attention when Target’s XP-based POS terminals all got pwned?

Stability and compatibility are not universal goods.

>>> What enterprise can afford to rewrite all of its software
>>> every ten years?
>> 
>> Straw man.
> 
> Not really.  Ask the IRS what platform they use.   And estimate what
> it is going to cost us when they change.

Monopolies are inherently inefficient and plodding.  Government is special only 
because it is the biggest monopoly.

(That’s why we have antitrust law: not because it’s good for the consumer, but 
because it fights the trend toward zaibatsu rule.)

Few organizations are working under such stringent constraints, if only because 
it’s a danger to the health of the organization.  Only monopolies can get away 
with it.

>> (The long dragging life of XP is an exception.  Don’t expect it to occur 
>> ever again.)
> 
> No, that is the way things work.   And the reason Microsoft is in business.

Microsoft stopped retail sale of Windows 7 a few months ago, and Vista back in 
April.

A few months ago, there was a big stink when MS killed off Windows 8.0 updates, 
requiring that everyone upgrade to 8.1.

Yes, I know about downgrade rights for pro versions of Windows.

Nevertheless, the writing is on the wall.

>> while your resources aren’t as extensive as Google’s, your problem isn’t 
>> nearly as big as Google’s, either.
> 
> So again, quantify that.  How much should it cost a business _just_ to
> keep working the same way?

Google already did that cost/benefit calculation: they tried staying on RH 7.1 
indefinitely, and thereby built up 10 years of technical debt.  Then when they 
did jump, it was a major undertaking, though one they apparently felt was worth 
doing.

There’s a cost to staying put, too.

> And why do you think it is a good thing
> for this to be a hard problem or for every individual user to be
> forced to solve it himself?

I never said it was a good thing.  I’m just reporting some observations from 
the field.

—————

[*] One bite at a time.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] Design changes are done in Fedora

Reply via email to