[H] Stevey's Google Platforms Rant (part 1)

Anthony Q. Martin Fri, 14 Oct 2011 10:24:16 -0700

Stevey's Google Platforms Rant

I was at Amazon for about six and a half years, and now I've been atGoogle for that long. One thing that struck me immediately about the twocompanies -- an impression that has been reinforced almost daily -- isthat Amazon does everything wrong, and Google does everything right.Sure, it's a sweeping generalization, but a surprisingly accurate one.It's pretty crazy. There are probably a hundred or even two hundreddifferent ways you can compare the two companies, and Google is superiorin all but three of them, if I recall correctly. I actually did aspreadsheet at one point but Legal wouldn't let me show it to anyone,even though recruiting loved it.

I mean, just to give you a very brief taste: Amazon's recruiting processis fundamentally flawed by having teams hire for themselves, so theirhiring bar is incredibly inconsistent across teams, despite variousefforts they've made to level it out. And their operations are a mess;they don't really have SREs and they make engineers pretty much doeverything, which leaves almost no time for coding - though again thisvaries by group, so it's luck of the draw. They don't give a single shitabout charity or helping the needy or community contributions oranything like that. Never comes up there, except maybe to laugh aboutit. Their facilities are dirt-smeared cube farms without a dime spent ondecor or common meeting areas. Their pay and benefits suck, althoughmuch less so lately due to local competition from Google and Facebook.But they don't have any of our perks or extras -- they just try to matchthe offer-letter numbers, and that's the end of it. Their code base is adisaster, with no engineering standards whatsoever except whatindividual teams choose to put in place.

To be fair, they do have a nice versioned-library system that we reallyought to emulate, and a nice publish-subscribe system that we also haveno equivalent for. But for the most part they just have a bunch ofcrappy tools that read and write state machine information intorelational databases. We wouldn't take most of it even if it were free.

I think the pubsub system and their library-shelf system were two out ofthe grand total of three things Amazon does better than google.

I guess you could make an argument that their bias for launching earlyand iterating like mad is also something they do well, but you can argueit either way. They prioritize launching early over everything else,including retention and engineering discipline and a bunch of otherstuff that turns out to matter in the long run. So even though it'sgiven them some competitive advantages in the marketplace, it's createdenough other problems to make it something less than a slam-dunk.

But there's one thing they do really really well that pretty much makesup for ALL of their political, philosophical and technical screw-ups.

Jeff Bezos is an infamous micro-manager. He micro-manages every singlepixel of Amazon's retail site. He hired Larry Tesler, Apple's ChiefScientist and probably the very most famous and respected human-computerinteraction expert in the entire world, and then ignored every goddamnthing Larry said for three years until Larry finally -- wisely -- leftthe company. Larry would do these big usability studies and demonstratebeyond any shred of doubt that nobody can understand that friggingwebsite, but Bezos just couldn't let go of those pixels, all thosemillions of semantics-packed pixels on the landing page. They were likemillions of his own precious children. So they're all still there, andLarry is not.

Micro-managing isn't that third thing that Amazon does better than us,by the way. I mean, yeah, they micro-manage really well, but I wouldn'tlist it as a strength or anything. I'm just trying to set the contexthere, to help you understand what happened. We're talking about a guywho in all seriousness has said on many public occasions that peopleshould be paying him to work at Amazon. He hands out little yellowstickies with his name on them, reminding people "who runs the company"when they disagree with him. The guy is a regular... well, Steve Jobs, Iguess. Except without the fashion or design sense. Bezos is super smart;don't get me wrong. He just makes ordinary control freaks look likestoned hippies.

So one day Jeff Bezos issued a mandate. He's doing that all the time, ofcourse, and people scramble like ants being pounded with a rubber malletwhenever it happens. But on one occasion -- back around 2002 I think,plus or minus a year -- he issued a mandate that was so out there, sohuge and eye-bulgingly ponderous, that it made all of his other mandateslook like unsolicited peer bonuses.


His Big Mandate went something along these lines:

1) All teams will henceforth expose their data and functionality throughservice interfaces.


2) Teams must communicate with each other through these interfaces.

3) There will be no other form of interprocess communication allowed: nodirect linking, no direct reads of another team's data store, noshared-memory model, no back-doors whatsoever. The only communicationallowed is via service interface calls over the network.

4) It doesn't matter what technology they use. HTTP, Corba, Pubsub,custom protocols -- doesn't matter. Bezos doesn't care.

5) All service interfaces, without exception, must be designed from theground up to be externalizable. That is to say, the team must plan anddesign to be able to expose the interface to developers in the outsideworld. No exceptions.


6) Anyone who doesn't do this will be fired.

7) Thank you; have a nice day!

Ha, ha! You 150-odd ex-Amazon folks here will of course realizeimmediately that #7 was a little joke I threw in, because Bezos mostdefinitely does not give a shit about your day.

#6, however, was quite real, so people went to work. Bezos assigned acouple of Chief Bulldogs to oversee the effort and ensure forwardprogress, headed up by Uber-Chief Bear Bulldog Rick Dalzell. Rick is anex-Armgy Ranger, West Point Academy graduate, ex-boxer, ex-ChiefTorturer slash CIO at Wal*Mart, and is a big genial scary man who usedthe word "hardened interface" a lot. Rick was a walking, talkinghardened interface himself, so needless to say, everyone made LOTS offorward progress and made sure Rick knew about it.

Over the next couple of years, Amazon transformed internally into aservice-oriented architecture. They learned a tremendous amount whileeffecting this transformation. There was lots of existing documentationand lore about SOAs, but at Amazon's vast scale it was about as usefulas telling Indiana Jones to look both ways before crossing the street.Amazon's dev staff made a lot of discoveries along the way. A teeny tinysampling of these discoveries included:

- pager escalation gets way harder, because a ticket might bouncethrough 20 service calls before the real owner is identified. If eachbounce goes through a team with a 15-minute response time, it can behours before the right team finally finds out, unless you build a lot ofscaffolding and metrics and reporting.

- every single one of your peer teams suddenly becomes a potential DOSattacker. Nobody can make any real forward progress until very seriousquotas and throttling are put in place in every single service.

- monitoring and QA are the same thing. You'd never think so until youtry doing a big SOA. But when your service says "oh yes, I'm fine", itmay well be the case that the only thing still functioning in the serveris the little component that knows how to say "I'm fine, roger roger,over and out" in a cheery droid voice. In order to tell whether theservice is actually responding, you have to make individual calls. Theproblem continues recursively until your monitoring is doingcomprehensive semantics checking of your entire range of services anddata, at which point it's indistinguishable from automated QA. Sothey're a continuum.

- if you have hundreds of services, and your code MUST communicate withother groups' code via these services, then you won't be able to findany of them without a service-discovery mechanism. And you can't havethat without a service registration mechanism, which itself is anotherservice. So Amazon has a universal service registry where you can findout reflectively (programmatically) about every service, what its APIsare, and also whether it is currently up, and where.

- debugging problems with someone else's code gets a LOT harder, and isbasically impossible unless there is a universal standard way to runevery service in a debuggable sandbox.

That's just a very small sample. There are dozens, maybe hundreds ofindividual learnings like these that Amazon had to discover organically.There were a lot of wacky ones around externalizing services, but not asmany as you might think. Organizing into services taught teams not totrust each other in most of the same ways they're not supposed to trustexternal developers.

This effort was still underway when I left to join Google in mid-2005,but it was pretty far advanced. From the time Bezos issued his edictthrough the time I left, Amazon had transformed culturally into acompany that thinks about everything in a services-first fashion. It isnow fundamental to how they approach all designs, including internaldesigns for stuff that might never see the light of day externally.

At this point they don't even do it out of fear of being fired. I mean,they're still afraid of that; it's pretty much part of daily life there,working for the Dread Pirate Bezos and all. But they do services becausethey've come to understand that it's the Right Thing. There are withoutquestion pros and cons to the SOA approach, and some of the cons arepretty long. But overall it's the right thing because SOA-driven designenables Platforms.

That's what Bezos was up to with his edict, of course. He didn't (anddoesn't) care even a tiny bit about the well-being of the teams, norabout what technologies they use, nor in fact any detail whatsoeverabout how they go about their business unless they happen to be screwingup. But Bezos realized long before the vast majority of Amazonians thatAmazon needs to be a platform.

You wouldn't really think that an online bookstore needs to be anextensible, programmable platform. Would you?

Well, the first big thing Bezos realized is that the infrastructurethey'd built for selling and shipping books and sundry could betransformed an excellent repurposable computing platform. So now theyhave the Amazon Elastic Compute Cloud, and the Amazon Elastic MapReduce,and the Amazon Relational Database Service, and a whole passel' o' otherservices browsable at aws.amazon.com. These services host the backendsfor some pretty successful companies, reddit being my personal favoriteof the bunch.

The other big realization he had was that he can't always build theright thing. I think Larry Tesler might have struck some kind of chordin Bezos when he said his mom couldn't use the goddamn website. It's noteven super clear whose mom he was talking about, and doesn't reallymatter, because nobody's mom can use the goddamn website. In fact Imyself find the website disturbingly daunting, and I worked there forover half a decade. I've just learned to kinda defocus my eyes andconcentrate on the million or so pixels near the center of the pageabove the fold.

I'm not really sure how Bezos came to this realization -- the insightthat he can't build one product and have it be right for everyone. Butit doesn't matter, because he gets it. There's actually a formal namefor this phenomenon. It's called Accessibility, and it's the mostimportant thing in the computing world.


The. Most. Important. Thing.

If you're sorta thinking, "huh? You mean like, blind and deaf peopleAccessibility?" then you're not alone, because I've come to understandthat there are lots and LOTS of people just like you: people for whomthis idea does not have the right Accessibility, so it hasn't been ableto get through to you yet. It's not your fault for not understanding,any more than it would be your fault for being blind or deaf ormotion-restricted or living with any other disability. When software --or idea-ware for that matter -- fails to be accessible to anyone for anyreason, it is the fault of the software or of the messaging of the idea.It is an Accessibility failure.

Like anything else big and important in life, Accessibility has an eviltwin who, jilted by the unbalanced affection displayed by their parentsin their youth, has grown into an equally powerful Arch-Nemesis (yes,there's more than one nemesis to accessibility) named Security. And boyhowdy are the two ever at odds.

[H] Stevey's Google Platforms Rant (part 1)

Reply via email to