Inside Infra: Chris Thistlethwaite

Sally Khudairi Tue, 31 Mar 2020 16:16:38 -0700

[this post is available online at https://s.apache.org/InsideInfra-Chris ]

"Inside Infra" is a new interview series with members of the ASF Infrastructure
team. The series opens with an interview with Chris Thistlethwaite, who shares
his experience with Sally Khudairi, ASF VP Marketing & Publicity.

- - -
"I get very attached to the technology that I'm working with and the
communities that I'm working with, so if a server goes down or a site's acting
wonky, I take that very personally. That reflects on how I do my job.”
- - -

- Let’s start with you telling us your name --how is it pronounced?

It’s “Chris Thistle-wait” --I don’t correct people who say “thistle-th-wait”--
that’s also correct, but our branch of the family doesn’t pronounce the second
“th”.

- What’s your handle if people are trying to find you? I know you’re "christ"
(pronounced "Chris T") on the internal ASF Slack channel.

Yeah --anything ASF-related is all under "christ".

- Do people call you "Christ"?

They do! I first started in IT around Christmastime and was doing desktop
support and office-type IT. When people started putting in tickets, and my
username was "christ" there, they were asking "why was Christ logging into my
computer right now?" and it became a thing. When I was hired at the ASF I told
Greg (Stein; ASF Infrastructure Administrator) about that story, he said "you
gotta go with that for your Apache username."

- When and how did you get involved with the ASF?

A long time ago I started getting into Linux and Open Source, and naturally
progressed to httpd (Apache HTTP Server). Truth be told, that’s where it
started and stopped, but I’ve always been interested in Open Source and working
with projects and within communities. Three years ago I was looking for a new
job and stumbled across the infra blog post for a job opening. I fired up an
email, sent it off to VP infra and that’s how everything started. The ramp up
of the job was diving deep into everything there is with the ASF and Open
Source --which I am still diving. I don't think I found the bottom yet with the
ASF.

- How long have you been a member of the Infrastructure team?

This November will be my fourth year.

- What are you responsible for in ASF Infrastructure?

Infrastructure has a whole bunch of different services that are used by both
Apache projects as well as the Foundation itself: the Infrastructure team
builds, monitors, supports, and keeps all those things running. Anything from
Jenkins to mailing lists to Git, SVN repositories and on the back end of things
we keep everything working for the Foundation itself within, say, SVN or
mailing lists, keeping archives of those things, keeping your standard security
and permissions set up and split out. Anyone you ask on the Infra team will
say: "I do everything!" It's too hard to explain --it's quite possibly a little
bit of everything that has anything to do with technology --as broad as it can
possibly be.

- So you really have to be a jack-of-all-trades. Do you have a specialty, or
does everybody literally do everything?

Everyone on the team generally does everything --for the most part any one of
us can jump into the role of anyone else on the team. Everyone has a deep
knowledge of a particular or a handful of services that they’ll take care of
--like, Gavin (McDonald; ASF Infrastructure team member) knows more about
Jenkins and the buildbot and build services than most people on the team. At
any one given point we’re on call and need to be able to fix something or take
a look at something, so everyone needs to be versed enough in how to
troubleshoot Jenkins. That can also be said for not just services that we
offer, but also parts of technology, like MySQL or Postgres or our mail system
or DNS: we do have actual physical hardware in some places, and we have VMs
everywhere too, so sometimes we’re troubleshooting a bad backplane on a server
or why a VM is acting the way it is. There's a very broad knowledge base that
all of us have but there are specifics that some people know more about than
others.

- How does ASF Infrastructure differ from other organizations?

There are a lot of similarities but a ton of differences. A big part of how
Infra is different is, to use a "Sally-ism": if you look at it on paper, it
wouldn't work --I've heard you describe the ASF that way. If you explained the
way things work at the Foundation to somebody, they would literally think that
you're making it up and there's no way that it would possibly be working the
way that it does. There's a lot of that with the Infrastructure team too: many
people that I keep in contact with that I've worked with over the years, from
my first job where we would buy servers, unbox them, rack them, wire them up,
set them up, and run them from the office next door to us --I'd be impressed
whenever I had 25 servers running in our little "data center" at that job, and
now I talk to these guys about what we do at the ASF: we have 200 servers in
10+ different data centers that are vendor-agnostic and we make it all work.
They ask: "how the heck do you do that?!" We just do --it's an interesting
thing as to how it all works together because we solve problems that others
have as well, but their problems are often centralized to one thing, or a data
center that they control and own, or one cloud provider that they control and
own, where they deal with a single vendor and possibly at most have to talk
with the same vendor in two different geographical areas. We're having to deal
with stuff with one cloud vendor that's a VM and other stuff on the other side
of the world that's actual hardware in a co-location or data center running and
the only thing that makes them the same is that they're on the Internet.

It's a good summation of the team too due to the fact that we’re all based out
of worldwide locations, we’re not all in one spot doing something.

- Describe your typical workday. Since you're all working on different things
on such a huge scale, what's it like to be you?

"It's amazing" [laughs]. Everyone on the team generally has some project or
projects that they are working on --long-running things for Infra.

I'm currently working on rewriting a script for Apache ID creations. The
process of putting your ICLA in, sending off to the Secretary, the Secretary
says, "OK good," puts in all your data, and that gets put into a file in SVN
...currently, we have a script that we manually run that does a bunch of checks
on the account and whatnot, and then creates it, sends off a welcome email,
whatever. I'm rewriting that because it's an old script, it's in several
different languages. It's actually six scripts that all run off of one script.
Consolidating that into one, massive script, that's in a supported language for
us, and then moving forward with it into something that we could potentially
automate, versus me having to run a script manually a couple of times a day.

Fluxo (the ID/handle for Apache Infra team member Chris Lambertus) was working
on some mail archive stuff in our mail servers. Gavin (Apache Infra team member
Gavin McDonald) is working on some actual build stuff. Everyone has kind of
"one-two punch" tasks that they work on during the day, and then the rest of
the time is (Jira) tickets or staying on top of Slack, if people are asking
questions in the Infra channel or in our team channel or something like that.
The rest of it is bouncing around inside the ASF and checking things out, or
finding out new projects to work on, or ways to improve such-and-such process.

- How many requests does Infra usually receive a day, in general?

Over the past three years, we've resolved an average of 6 Jira tickets a day,
year-round. We've had 213 commits to puppet repositories in the last 30 days.
We handle thousands of messages on our #asfinfra Slack channel, and have had
659 different email topics in the last year.

- Dovetailing that, how do you keep your workload organized?

Everyone on the team does it their own personal way. I have a whiteboard and a
Todoist list. We also have Jira to keep our actual tickets prioritized and
running. We have a weekly team meeting/call and talk about things that are
going on, and is the more social aspect of what we do week-to-week.

- How do you get things done? You're juggling a lot of requests --what's the
structure of the team? How do you prioritize when things are coming in? Is
there a go-to person for certain things? If you're sharing everything, how do
you balance it and who structures it? How does that work?

To one end, the funnel to us starts with Greg and David (ASF Infrastructure
Administrator Greg Stein and VP Infrastructure David Nalley). It's different
from other places that I've worked, where I'm on a team of other systems
administrator/engineering people, and we have a singular, customer-facing site.
Someone says, "Hey, this should be blue instead of red," there's a ticket and
we make the change and then it goes to the production.

There're many different ways to get a hold of the Infrastructure team. Everyone
gets emails about Jira tickets and gets updated as soon as one of those comes
in. If it's something that you know about --say, the Windows nodes that we
handle-- those all fall into my wheelhouse because I'm the last one to work
with Windows extensively. Everyone else knows how to work with them, but it
makes more sense for me to pick it up in some cases.

Most of the stuff in Jira are very "break-fix" kinds of things. A lot of the
requests on Slack are too, for example: "DNS is busted," and we fix DNS. It's a
very quick, conversational, "Let me go change that," or, "I'm going to go fix
that real quick." Of course, some of the Jira tickets are very long-running,
but the end result is they're fixing something that used to work.

We were originally running git.apache.org, and Git WIP, so we hosted our own
internal Git servers and we would read-only mirror those out to GitHub.
Somewhere along the line, Humbedooh (the ID/handle for Apache Infrastructure
team member Daniel Gruno) started writing out Gitbox or building Gitbox based
on the need to have writable GitHub repositories. He built Gitbox and set up
with the help of some other people on the team, got it going, and that became
our replacement for git.apache.org. While we still host our own Git
repositories, people are free to either write to ours or write to GitHub, and
the changes are instantaneously mirrored between the two.

We had Git hosted at the ASF, and had GitHub as a read-only resource. The need
arose to have rewrite on both sides: Humbedooh went and built out MATT (Merge
All The Things), which does all of the sync between GitHub and our Git
instance.

MATT started a while ago, and Humbedooh added on to that to do the rewrite to
GitHub. Basically what all that does is once your Apache ID is created, or if
you have one already, you go on ID.apache.org, you add your GitHub username in
there and then MATT --there's another part of that called Grouper-- MATT and/or
Grouper will run periodically, pull data from our LDAP system and say, "Oh,
ChrisT at apache.org has ChrisT as his GitHub ID. I'll pull those down." It
says, "ChrisT is in the Infrastructure group. Hey look, there's an
Infrastructure group in GitHub. I'll give ChrisT write access to the GitHub
project." In a nutshell, that's what that does.

There's a ton of other house cleaning things, if you get removed from the LDAP
group ... we run LDAP and keep all this stuff straight. If you get removed from
the Infrastructure group at LDAP then MATT/Grouper will go and say, "Oh, this
person's not in this LDAP group but they do have access in GitHub. Let me pull
that so that they don't have access to that any more." It does housekeeping of
everything as well as additions to groups and that kind of thing. There's a ton
of technical backend to that, and that's what Humbedooh's doing.

At first when Git and GitHub were set up, it was fine: the ASF has to keep
canonical everything about what goes into each project. You could only write to
our Git repos. Then it was conveniently mirrored out to GitHub because there's
a lot of tools that GitHub has that we didn't have or weren't prepared to set
up. GitHub has a very familiar way of doing things for a lot of developers.
Once GitHub Writable came along with Gitbox and the changes to MATT, that
opened up a whole other world of tools for people on projects to use. If they
wanted to use pull requests on GitHub, they could start using pull requests on
GitHub to manage code. They could wire up their build systems to GitHub with
Jenkins so that whenever a PR was submitted and got approved, it would kick off
a build in Jenkins and go through unit tests and do all the lovely things that
Jenkins does.

It was really an evolution of, "Here's the service that we have. Someone,
somewhere, be it infrastructure or otherwise, once they have writable GitHub
access, here we go." And here's the swath of things that that now opens up to
projects inside the ASF that if they could come and set up a project with us,
and then never, ever actually commit code to the ASF, it would always go to
GitHub but still be safe and saved on our GitHub servers for ASF project
reasons.

At the same point, we saw a need and said, "Let's build this out and go."
Another funnel that comes into us is when we're on-call, something breaks and
we ask, "Why do we do it this way? We should be doing it a different way." We
then come up with a project to fix that or build it. It's a very interesting
process of how work gets into the Infrastructure team.

It's been an interesting ride with that one.

There's always stuff that we're working and fixing and making better. For the
most part, Gitbox as it is now is kind of in a state of "It's getting worked
on". If there are bugs that need fixed, it gets fixed, but I don't know what
the next feature request is on Gitbox. There's talk of other services ...like
GitLab. If someone wanted to write code and put it in GitLab as opposed to
GitHub, then someone would need to come in and write the connector from Gitbox
to GitLab. So it's possible. I don't know if that's necessarily an
Infrastructure need as much as it is a volunteer need for infra. But it's a
system that can be set up to any other Git service as long as someone goes in
and writes that.

- You brought up an interesting point here, which is volunteers. Do volunteers
contribute to Infra also?

We sometimes have volunteers, yes. We have a lot of people on the infra mailing
lists that will bounce ideas back to us or they'll work on a ticket or put in a
pull request.

- Well, the need is not as critical because you have a paid team, versus Apache
projects.

Right. That's exactly true. There's a bit of a wall that we have to have
because we work with Foundation data, which not everyone has access to.
Granted, we're a non-profit, Open Source company and everything's out there to
begin with, but usernames and passwords of databases and things that we have
encrypted that the team has access to isn't necessarily something that you
would want any volunteer to have access to.

- How do you stay ahead of demand? This is a really interesting thing because
part of it is you're saying, "Necessity is the mother of invention." You guys
are doing stuff because you've got those binary, "break-fix" types of
scenarios. In an ideal situation, do you even have enough runway to be able to
optimize your processes? How do you have the opportunity to fix things and
improve things as you're going along if you're firefighting pretty much all day
long?

That's a really good question about just how our workflow is. In other
companies that I've been in, there's the operations people that are doing the
"break-fix", and then there's the development people that are doing "the next
big thing". The break-fix folks are spinning the plates and keeping them spun
without breaking, and that's a lot of firefighting. That's literally all that
job is. Even when you're not firefighting, you're sitting around thinking about
firefighting in a sense of, “when is this going to fall over again? If it does
fall over, what can we do to fix it so it doesn't do that anymore?" And in the
past, the break-fix guys, the firefighters, would end up saying, "Hey, there's
this thing that needs fixed." And it would fall over the wall to the
developers. They would develop the fix for it, and then it would go back into
production and then the cycle continues.

To some extent, that's kind of where DevOps came from: if you merge the two of
those together, then while you're firefighting you can also write the fix for
the problem, and then you don't have to wait for the lag between the two. We
don't have that split here. Everyone on the team is firefighting with one hand
and typing out the solution with another. And a lot of the times our project
work, like getting a new mail server spun up or my task to rewrite the workflow
for new Apache ID creations, I've been working on that for a very long time
because it will keep falling off ... it gets put on the backburner while we're
like, "Hey, we found out that our TLP servers are getting hammered with
downloads from apps and people trying to use them instead of the mirror
servers." So, let's set up downloads.apache.org and we can funnel stuff over to
that so that that server can get hammered and do whatever it needs to do so
that our www. site and all the Apache Project websites stay up and running in a
more reliable way.

- What's the size of the teams that you were dealing with before that had a
firefighting team and a dev team versus ASF infra?

The last "big" corporate job I had was ...six ops people that kept the site
going, four database people, another eight technical operations-type people…
all-told it was about thirty.

There were technically thirty firefighting people and we had a NOC (network
operations center) that was literally people that only watched dashboards and
watched for alerts. Whenever those go off, they’d call the firefighting people.
The NOC was another 20 people. And then the development teams were ... twenty
to fifty people.

- What kind of consumer base were they accommodating? Does it match the volume
that ASF has? Was it more of a direct, enterprise type of, "We have a customer
that's paying, we have to respond to them" situation? Or is it different?

This was at a financial services company that transacted on their Website:
completely different from the type of stuff we're dealing with here at the ASF.
Volume-wise, they were much smaller, but it was much more ...visible, as their
big times were at the start of market and end of market. After end-of-market
came all the processing for the day to get done before markets started the next
day. The site had to be up 100% of the time. We had SLAs of five minutes. If
you got paged or something broke, you had to get the page and respond to it in
a way of, "Hey, this is what's going on and these are the people that I need
involved with it," all within five minutes of it going off. That was the way
the management structure was. It was intense.

In scale, Apache probably does way more: they do way more traffic across all of
our services in any given day. If someone doesn't get mail for a little bit,
then they come and tell us or we get alerted of it by our systems, and we go
and we fix it and we take care of it. But with the financial services group,
people were losing money: dealing with people and money is just a very
stressful situation for anyone working in technology because you have to get it
right and it has to be done as fast as possible before someone's kids can’t go
to college anymore. It was a completely different minefield to navigate.

- The type of stress that's involved or the type of demand or the pressure is
different, but you also have the responsibility with ASF that systems have to
be up and running. I understand it's not mission critical if something goes
down for more than five minutes, which is different in the financial sector,
but do you feel that same type of pressure? Is it there or is it completely
different for you?

No. I think I do because we also have SLAs here: they're just not five minutes.
We have structure around that and the way that we handle uptime and that kind
of thing. I get very attached to the technology that I'm working with and the
communities that I'm working with, so if a server goes down or a site's acting
wonky, I take that very personally. That reflects on how I do my job. If a
server's not working or if something's broken either because of me or something
externally that's going on, I want to get that up and running as fast as
possible because that's how I would expect anyone to work in a field that has
...any technology field, for that matter. And generally, that's the same
attitude the rest of the team has as well.

- How has ASF Infra changed over the years?

It's matured quite a bit. When I first started, it was Gavin, Fluxo, Humbedooh,
Pono (former ASF Infrastructure team member Daniel Takamori), and me. There
were five of us. The amount of stuff that we got done, I'm like, "Man, there's
no way that five people can do this."

That's kind of what I'm pointing at. If you're a team of eight or five or
twelve or whatever, compared to the other thing that you did with the other job
that had maybe a core team of twenty, thirty --that in itself is insane.

We were five people, everything was very, "Here's the shiny thing we're working
on," and then something else would come up and we'd have to jump on that. Then
something else would come up and we'd have to jump on that. We were very ...I
don't want to say we were stretched thin, but there wasn't necessarily ...time
for improvement.

There was a lot of stuff we had still in physical hardware, and a couple of
vendors that we no longer use. But things were moving more towards a
configuration-based infrastructure with Puppet instead of one person building a
machine, setting up all the configs themselves, installing everything and then
letting it go off into the ether to run and do its job. We were moving
everything towards Puppet to where you configure Puppet to configure the
server. So then if the server breaks, or goes down or goes away or we need to
move vendors or whatever, all you need to do is spin up a new server somewhere
else, give it the Puppet config, it configures itself and then goes off into
the ether to run and do whatever it needs to do.

- That's great. More automation.

Right. We were automating a lot more stuff right when I first started. Over the
course of the next year, the team kind of ebbed and flowed a little bit until
we were eight in the last year. We started to get to the point of "where can we
point the gun to next? What can we target next to get it taken care of and
done?" That's where we started taking on more specific infra projects, for
instance, mail. Our mail server has been around since the dawn of time, and
it's virtualized so it moves servers every now and then, but the same base of
it is quite old in technology standards.

Fluxo started moving this on to newer stuff and he got that going. We started
taking care of projects that were not broken, but needed to be worked on.
Instead of waiting for it to break, we're fixing and upgrading and moving down
that path versus firefighting, break-fix, that kind of thing. We were moving
more towards, "Hey, I see a problem. I have time. I'm going to take care of
that and make that into a more serviceable system."

Automation has helped quite a bit with that. I also think that just as the team
grew, it just got to a point where I think tickets were getting responded to
quicker, emails, chat was responded to quicker. And then we also could focus
more on the tools that we use for the foundation. Like, HipChat was going away.
We needed a new chat platform, so we chose Slack. And then we updated and moved
everything over to Slack, and that's where we are with that. It started
following into its own with workflows of like, "Oh, okay. How do we get this
done? Let's go do that."

- What areas are you experiencing your biggest growth? Is it a technical area?
Like, "Hey, all of a sudden mail's out of control"? Or, "Hey, we need to
satiate the demand for more virtual machines," or is it a geographic influence
that's coming in in terms of draw? Where are you guys pointing all your guns to?

Currently we're trying to get more out to the projects and talk to people more
often. Not that we didn't do that before, because ApacheCons and any Meetups
that we had, Infra would always have a table. We were always accessible, but we
were always passively accessible. We weren't really going out and talking to
projects proactively to say, "Hey. What do you guys need from us? What are we
doing with this?" So I think that's one part of it, that I think that we're
moving towards a little bit. It's not at all technical, but more of a
foundation broadening, community broadening thing that we're doing.

That's one part of it. The other thing that we're doing too is from a more
technical or infrastructure standpoint, is we're really trying to get our arms
around all of the services we provide, and then really take a look at those and
say, how is this used inside the ASF? How is it used in the industry as a
whole? Do we need to put more time and energy towards those things in order to
make the offerings of the infrastructure team a little bit a more solid
platform, kind of thing? Generally, that ... and on top of any other automation
and that kind of stuff, I think that's really the two spots that I see infra
growing in a lot in the next year-ish of just really boiling down our services
to, "Hey, we've seen a lot of people using this. And a lot more projects are
using this. It's not just a flash in the pan. We need to build out more infra
around blah service, so let's really do that and make that a solid platform to
use."

- What do you think people would be surprised to know about ASF Infra? When you
tell someone something about your job and they go, "Whoa, I had no idea" or,
"That's crazy." What would people be surprised to know?

That Apache has an infrastructure team. [laughs]

- Why are you saying that?

Because honestly, I don't think a lot of people know about the Infrastructure
team. Those that do, have used us for something, not used us for something,
have talked to us about something, and worked with us on something. Those that
don't are like, "Oh, I didn't know the ASF paid people to be here," --that kind
of thing. That's kind of the two reactions I've got from people. It's like,
"Oh, that's cool. You work for the infrastructure team." Shrug. And then the
other people are like, "Oh, sweet. Yeah, that's great. I know Gav. I've worked
with him on blah, blah, blah." But that's not necessarily surprising. I mean,
it is in a sort of way.

- When people ask, "What are you doing for work?" and you say you work for ASF,
do people even know what that is? Do they know what you're doing? Do they care?
Are they like, "Oh, okay. Whatever"?

There's literally three types of people that I've run into that ask, "Oh, what
are you doing for work?" One person is the person that has no idea what the ASF
is, not even the vaguest hint of Apache, and they're like, "Oh, okay. That's
cool." There's that next person that does, and may or may not know about the
ASF but knows of Apache, the Web server, or some other lineage of that.
They're like, "Oh, whoa. That's super cool. It's impressive.” That's wild. Then
the third people ask "Why are ‘Indians’/Native Americans running software? That
doesn't make any sense to me" and "Are you on a reserve?" I swear to God I've
gotten that question before. I don't even know how to answer that. I'm like,
"No, buddy."

- Are these technologists or are these just guys off the street? Are they in
the industry?

Guys off the street. I say Apache Software Foundation, and they're like
"Apache" and "software" doesn’t make sense. Actually I've gotten mean tweets
too whenever I've been tweeting about being at ApacheCon. Things like I'm
"taking away" from Native Americans and whatever...

- We also get that on Twitter, on the Foundation side: we get included in
tweets about some kind of violation along the lines of, "Stand up for the ..."
I get it. From time to time we also get sent these "How dare you?" letters,
that sort of kind of thing. It's an interesting challenge, the whole issue of
"why do Native Americans run this thing?" misinterpretation. Let’s move on.
What's your favorite part of your job?

The whole job is my favorite part of the job.

- That's funny because everyone at Infra ... You know how people have bad days
or may be grumpy or whatever, in general you guys seem to all like each other.
You all have a great camaraderie. You all get along. You work really closely
together. It's a very interesting thing to see from the outside. Is that true?
Or are you just playing it up? Does it really work that way?

That's absolutely true. I've found that generally speaking, when you get a
bunch of nerds together, they either really like each other and everything
works or they really don't like each other and nothing gets done. The team is
great, and it's like no other team I've ever worked with before. But it's very
odd because you go through the interview process, and the interviews are
interviews. I mean, you get to know people in interviews, but not really. Then
you start working with people, and at some point you start getting below the
surface. And at some point you get deep enough to where you find out whether or
not ...how you gel with all these people.

It's very odd that all of us have the same general sense of humor. We'll talk
about food non-stop in the channel, and recipes and cooking, and different
beers or different whatevers. It's nice to get to that point with a team that
you're comfortable enough with everybody to ... like I said, I've been here
three years and there is still so much that I don't know, both technical and
non-technical, about the ASF. I ask very dumb questions in channel and say, "I
have no idea why this is doing this this way," or, "Can someone else take a
look?" or, "I don't know what I'm doing here." And never in the entire time
I've been here, from the day one until now, has anyone ever chastised me for
not knowing something or said anything about the way that I work or something
like that. Well, at least not in channel. At least not publicly.

Everyone's very supportive. It doesn't matter if you know everything there
possibly is to know about one singular product or thing you're working on, or
don't know anything about it. You can ask questions and really learn about why
it was done the way it was done, or figure out how to fix a problem. No problem
on the team. It's just like, "Okay, yeah. This is what you have to do." Or,
"Here's a document. Read up on it." Or, "I don't know either." And then out of
that comes an hour of conversation and then a document pops out, and then the
next person that asks, we can say, "Here, go read the doc." Yeah. I mean, we're
all very happy. Very happy.

- Which is really good. Looking back when you first started, what was your
biggest challenge when you came onto the team?

Oh man. I look back at that and I feel like the learning curve was ... It
wasn't a curve. It was a wall. I've used Linux, I've used Ubuntu for a while
and various other flavors of Debian and whatnot, so getting spun up on all of
...expanding my Linux knowledge was a big deal, expanding everything about the
ASF and how it works. Which I'm still trying to figure out. If you know, send
me something to read to figure out how that all works. I mean, I don't want to
sound like I was completely out of my depth and I have no idea what I'm doing,
but I feel like I was completely out of my depth and I had no idea what I was
doing.

There's a lot about the ASF that is just tribal knowledge, and there's a lot
about Infra that's tribal knowledge. It's just no one has anything written down
--"the server's been running under Jim's desk for the last 15 years in a
basement that has battery backups and redundant Internet, so it's never gone
down. But don't ever touch that server, because if it goes down, then all of
our mail goes down" or whatever. There was a lot of figuring all that out for
myself and digging around. Which is, frankly, one of the parts that I really
enjoy, is just, "Hey, this thing broke. I've no idea what that thing is. I've
no idea where it lives," and just diving in and trying to figure out what's
going on with it and how it's built, and then the hair trigger that sets it off
to crash and never work again. Yeah. That's an interesting question too.

- What are you most proud of in your Infra career to date? You're talking about
overcoming these challenges, I'm always curious just to see what people are
like, "Yeah, I'm patting myself on the back for that one" or, "Ta-da. That's my
ta-da moment."

I did lightning talks at ApacheCon Las Vegas and didn't get a phone call from
you when I was done. [laughs]

- I wasn't at lightning talks --what did you say? What would make me call you?

I didn't say it. We were on stage, and it's John (former ASF Infrastructure
team member John Andrunas), Drew (ASF Infrastructure team member Drew Foulks),
and I, and we figured we'd do lightning talks: "Hey, we're the new guys: ask us
infrastructure questions." A week or two before ApacheCon, there was a massive
outage at a particular vendor. It wasn't: "Oh, our server's down for a while,"
the server went down and then it was *gone*. It got erased from the vendor
side. I can't remember what service it was. There was something that
disappeared two weeks before Vegas and never came back.

It wasn't just us, though: tons of companies had this issue. So we're on stage
answering questions, and someone asks where this service went: "What happened
to XYZ?" And John has the mic and he goes, "You should probably go ask [vendor
name]." At that point it was very widely published that the vendor"s response
was like, "Whoops, someone tripped over the cord that powered the data center.
And when it came back up, then deleted all of your VMs.” They totally
acknowledged it and they didn't give refunds for it, so it was a little bit of
a PR kerfuffle for them. The vendor is in the other room handing out buttons
and stickers, and John was like, "Oh yeah, go ask the [vendor] guys what
happened to your server. That's their fault," he said it jokingly but my jaw
dropped.

- [laughs] No one told me this story. No one said anything. Someone's trying to
protect you. I had no idea this happened ...oh my gosh.

Well, David Nalley was in the back of the room, and he's screaming with his
hands cupped around his mouth, "Don't badmouth the vendor and the sponsors." I
deflected and quickly moved onto something else. [laughs]

But yes, that's another good question that I haven't actually reflected on.
Looking back and seeing where Infra was when I first started and where it is
now, it was a very runnable and very good team then, and it's a very runnable
and it's a very good team now. I feel like a lot of the work that I've done and
a lot of the work that the team has done over the last three years has been
getting from a spot of "everything's on fire, who's holding up what this
weekend?" to things being stable and us nitpicking on whether or not something
needs to be updated or not. That's huge. That's a big step from like starting a
company and treading water to being profitable and having resources to do other
things versus just keeping your employees paid. I mean, it's a big step for a
company and it's a big step for Infrastructure.

- I love your talking about how you guys are tightly-knit and all that. How
would your co-workers describe you?

The other odd part about that too is being completely remote and not having
day-to-day, face-to-face interactions with people. You get a very odd sense of
people through text for a 24-hour period that you're online reading stuff. It's
a different perspective than if I was in the office every day, working on
something and interacting with people. Even though every day, except for the
weekends, I'm online talking to these guys and doing stuff. How would they
describe me? Dashingly good looking and ... I don't know. [laughs]

- I know that Infra's "just Infra," right --you guys are all under the Infra
umbrella. Do you have a title? When you got hired, what do they call you?

We're all systems administrators. The only person that actually has a title is
Greg, and he's Infrastructure Administrator.

- What are the biggest threats you face? For infra folks or systems
administrators or infrastructure administrators even, what do you need to watch
out for these days? What's big in the industry? Is everyone saying, "Oh, XYZ's
coming"? In terms of your role in the job: is there something that you need to
keep your eye on? Is there something that you would advise other people, "If
you're in this job keep an eye out for blah, this is a new threat" or anything
along those lines?

General scope stuff. 16 years ago, everything was hardware: you bought hardware
and you had to physically put it somewhere. And virtual machines came along
about the same time. People were starting to do virtual stuff to where you
could have a physical machine and then multiple machines running on that,
sharing resources. Then cloud and infrastructure as a service, and everything's
been moving more and more towards that over the years.

Of course, there's still people that work in office IT, doing desk support
stuff or office infrastructure type things.Those are still a majority of how
things run at companies. As everything is moved more towards the cloud or
hosted services, more systems administrators are becoming more like software
engineers. And software engineers are becoming more like systems
administrators. They're kind of melding into one, big group of people. Now of
course, there are still people that only write software. But gone are the days
where it used to be someone would write some code and say, "I need to deploy it
and get it out to all these computers." They would write the code, they'd hand
it off to a systems person. Systems would go and configure on whatever server
to get it out to however many machines and hit the button and go. The software
developer never really needed to know hardware specifics of the systems that it
was going to run on. And the systems people never really needed to know what
software packages this was getting put together. There's exceptions to that,
but for the most part ...

Over the years, it's fallen into a thing now where the software developer knows
exactly what systems this is going to run on and how it's going to run there,
so it's more efficient and things work better and they're releasing less buggy
code based on the fact that they know they're closer to the hardware. And the
systems people, they want to troubleshoot it more and work with it and fix
problems because they're closer to the software and know more about its
internal workings and how it's going to run on systems. Everything is getting
more and more chunked down into, first it was VMs, then it's cloud, then it's
containers with Docker and things like that, and it's going to get more
virtualized down into that. Knowing about Docker orchestration and things like
Kubernetes and Apache Mesos. The reality is other people run Kubernetes, people
run Docker, people run everything. That's the interesting thing in terms of how
they do it at ASF. We don't require folks to do just one thing.

In terms of where the industry's going ... everything's getting pushed down to
"a developer can work in a container on a set of systems, write software for
that and then deploy that to a machine themselves, never involving a systems
engineer at all, and build a product using that." It's getting stuff out the
door faster, and it's also keeping the unicorn of the industry a while to go
... even today, I developed this thing, it works on my machine. If I move it
over to another computer, it stops working. Why? What's the problem with that?
Containering or containers fix that problem. The container you run on my system
runs the same way as it does on every system everywhere. It takes the "runs on
my machine" thing out of the equation.

- What's your greatest piece of advice? What would you tell aspiring sysadmins?

Part of the ASF is the community behind it, and a giant part of that is what
makes it work. I mean, you could say all of it. That's what makes everything
work with this. Right when I first started the sysadmin kind of thing, I didn't
get into Meetups and Linux Users Groups and any of that stuff. I didn't get
into the network. I didn't go into the community that I had around me. And
honestly, I don't know if that's because it didn't exist or because I didn't
know about it or what, but now that I'm older and wiser, the community part of
it is really ...there's a massive benefit to that. Aside from socialization, or
networking and how to get a better job through networking, getting together
with like-minded people and talking through your problems is an amazing tool to
use. And I didn't do that enough when I was a sysadmin starting out, and
looking back it's something that I sort of regret not doing, was really sharing
knowledge with other people in the community and building a group of people
that I could ping ideas off of, or help with other ideas, or share in the
knowledge of, "Hey, this is what's going on in the industry" or, "Hey, I saw
this at work the other day. How do we work around that?" or that kind of thing.
It's much easier these days with social media: the never-ending amounts of
social media. But it's a big, important part of my day-to-day now, that I wish
I had 16 years ago.

- That's powerful. OK, If you had a magic wand, what would you see happen with
ASF infra?

If I had a magic wand, I'd update our mail server instantly or maybe magic wand
a few other projects.

- Wait. I know you're joking, but what is the problem with the mail server?

It's running on an older version FreeBSD that doesn't play well with our
current tools. Some form of that server has been upgraded, patched, moved,
migrated, etc for the last 20 years. We want to bring it up to more modern
standards. Mail runs fine for the most part, but it's probably the most
critical service we have at the ASF and we want to make sure everything
continues to hum along. Because of that, it's a huge project that touches a ton
of different parts of our infrastructure.

- How big is it?

It's all of our email. Every email that goes through an apache.org address.

This is a huge project and Chris (Lambertus) has been working on it for a while
--it's not a simple thing to fix. It's very, very complicated. We couldn’t do
it without him.

Back to the magic wand thing: I'd wish for more wands.

- - -

Chris is based in Pennsylvania on UTC -4. His favorite thing to eat during the
workday is chicken ramen.

# # #

NOTE: you are receiving this message because you are subscribed to the
announce@apache.org distribution list. To unsubscribe, send email from the
recipient account to announce-unsubscr...@apache.org with the word
"Unsubscribe" in the subject line.

Inside Infra: Chris Thistlethwaite

Reply via email to