Re: Where are the software engineers of tomorrow?

Darren New Wed, 16 Jan 2008 21:28:57 -0800

Bob La Quey wrote:

Assuming you (a) have a customer who knows what they want,


Of course you do not have a customer who knows what they want.
That is one reason why definition is hard and requires a lot

of face to face.

Oh, just to clarify the sort of thing I meant when I said the "customer"might not know what they want, and if they do they might not be able totell you:

In my current job, the original spec went something like this. (Mildlyvague because it's still a couple weeks before we're live and public.)

We're going to start out in 3 cities, and maybe eventually 300 cities orso, but maybe not. In each city, we're going to have somewhere between 5and 150 high-speed data streams, and an equal number of low-speed datastreams. The data streams come in over a custom PCI card, which isn'tbuilt yet, but we're pretty sure it'll have drivers available for atleast one of the operating systems we know runs on the used hardwarewe'll be buying once all the deals are put together. The high-speed datastreams need to be processed by a piece of software from a Germancompany we're still negotiating with, so we don't know if it's alibrary, a server, or what. The low-speed data streams each have aformat which must be reverse-engineered from samples - i.e., you mustguess what it is. In addition, it'll change at times, and sometimes thepeople providing the stream will simply change it for their amusement,inserting random commentary in the middle of the data.

The library that processes the streams comes from a company full ofphysicists, so they don't understand that a SIGSEGV is almost always thefault of the C++ code they wrote; instead, they'll insist it must bebecause we have multiple Linux kernels installed on the disk, even thothe software merely does complex math on data files. There's no loggingin their server, so they can't track down their crashes, so they simplycure the problem by ignoring all signals, conveniently making itimpossible to cleanly shut down their server once it starts. And ittakes several minutes to come back up (which of course one couldn't tellbefore getting the software and writing enough of the system to fill itup). So we'll want to run two of each server, and insert records intoeach, except that their client library, while asking you to identifywhich server you want to use each time, ignores it every time except thefirst time. They have no version control, so if you find a bug in amonth-old version you've tuned your software to work well with, theycannot give you the same version with the bug fixed, and you have totake the latest version, which often has a completely different set ofclient interfaces. They take a variety of arguments on the command line,but fault out if any don't match what they recommended, so any change inwhat you want to do with their server requires a round-trip engineeringchange, recompilation in Germany, upgrading to the new version of theclient APIs, etc. They are, unfortunately, the only company in the worldthat does what they do.

Then the customers connect in over one network owned by S, delivering alarge bolus of data each time. The company providing that network (S)has insufficient engineering resources, so they simply ignore bugreports, refuse to make any sorts of changes you've already paid for,and in general do things like require invalid XML as the interface intotheir system and use SOAP libraries that are specifically coded to onlytalk to the same SOAP library, rather than actually following thestandard. S describes the data they will provide for us from eachcustomer. They lie, and admit they are unable to provide any of themetadata they promised, after the system is up and running enough toaccept their data and we realize they're providing all zeros in thosefields, making us have to replicate the work for each customer at eachcity (3, or 300) instead of doing it only in the city where the customeris. They are, unfortunately, the only company in the world that doeswhat they do.

The boss wants to support 10 customer requests per second, which you canfit over a T1, so you design the system around that, until three monthslater the boss wants to support 400 customer requests per second becausehe made a deal with someone to get the funding you need to build it inthe first place. This requires moving parts of the system next to theNOC of the "S network" from the previous paragraph.

The customer requests get matched against the data streams from thecities, which might require sending the request out to the city or mightrequire piping all the city streams back to the customer-servingcomputer, depending on (1) how much the German software compresses data,(2) how many customers are calling in, and (3) how many cities have yourcustom PCI cards in them.

We also have a data stream from company M, which altho in XML, company Mis unable to document what any of the attributes or tags actually mean,beyond that implied by the names of the tags. XML in this case is simplyused as a mechanism to avoid having to document how to read the data.The library to access it is opaque. It also requires providing a buffersize, but M doesn't put any upper limit on the size of the buffer. Theclient library does not release the memory it uses, so you have to runfor a bit, exit, and restart. About five times a day, one of the recordswill be corrupt and unparsable XML. The records include timestamps butno timezone data. Occasionally, the connection to company M will getrandomly deconfigured, and your connection to M will be off the air fora while while they get around to resetting it, not that this was part ofM's documentation of course. Company M is, unfortunately, the onlycompany in the world that does what they do.

We also get some *third* types of low-speed data streams from variousplaces in various cities. Those streams are also fairly willy-nilly, inthat while the data is the best available, none of it is actuallyauthoritative. Like on this particular stream, between 6AM and 9AMweekdays, the data should be ignored and company M's data should be usedinstead, except when the data starts with a Z. Others will be like thattoo, but we don't know which until we get the streams in and see. Thedata is always timely, unless it's not, but you can't tell that. And theactual content of the data differs from that provided by M, so you can'treally match it up in any good way. Usually. Oh, and each of thesestreams is a different partner with specific deals and advertisingmodels and so on, so they'll want to be able to customize thingsthroughout the system when it's their stream we're using, but we don'tknow which kinds of things. We'll let you know what we need to do afterwe sign the contracts promising to do it. By the way, make sure thesoftware that's accepting the connections every few seconds for theselow-speed data streams never breaks, because it *takes out yourpartner's servers* if they can't connect to you after they've configuredthat connection, and your partner goes off the air until you come backonline. Not that you know this until a power failure dies in such a wayas to trip the breaker on the rack's UPS and powers down all yourservers. Remember to include that in your spec.

We match up the real-time customer data coming in from S's network tothe real-time data coming in on the high-speed data streams, correlatewith the somewhat-real-time low-speed data streams, and generate a webpage with the results. The contents of the web page have to be approvedby all of the other customers of S, in advance, with whatever changesthey want, while live and available to the public, before S will allowus to use S's network, in spite of the fact that S's network has nothingto do with the Internet, doesn't carry internet traffic, and S's othercustomers don't carry internet traffic.

Whether the matching of data occurs in the individual cities or in thecentral servers depends on whether it's less bandwidth to ship thecustomer data to the cities or the city data to the customer server. Ofcourse, since we don't know how much data the German software generatesnor how many customers we'll be processing within a couple orders ofmagnitude, we'll have to decide that a few months from now. Make sureyou order the right number of T1's, which also take a month or two toinstall. We'll let you know where we're putting the servers after wesign the leases, at which point we'll be able to tell you how manyhigh-speed data streams are available at that location.

By the way, the matching is also heuristic. About 40% of the time,you'll get 10 different matches, all with zero confidence. About 10% ofthe time, you'll get multiple 100% confidence matches of conflictinginformation. Make sure to figure out in advance of outsourcing whatalgorithm will give you good results in spite of such results, which youdidn't know you'd have, and which the Germans were surprised you got,and more surprised you figured out how to work around.

Oh, by the way, the web-services database you're contractually obligatedto fall back to when you don't get any matches at all? It's mostlyanswers your customers are sure not to be interested in, so that'll giveten bogus matches of zero confidence also, except that about 30% of thetime the right match is in the list one or more times. So by the way,three weeks before launch, we decide we need to build our own databaseto match against too. In spite of not having any actual authoritativedata to build it with. Do take care of that, won't you? Make sure youhave at least two weeks of data in it when we launch. Note that we don'thave any machines to actually run this database on, as they're allrepurposed to things like the backup servers for when the Germansoftware crashes and we have to use the hot spare while the primarycomes back up. All that's left are four machines which don't see thecustom cards, two with disks too small to hold an hour worth of data,and two with huge disks but (for yet-unknown reasons) will only run onedisk at about 5% of the speed it's capable of, in spite of buying a newcontroller card and new disks.

Then we also offer links to "affiliates" on those web pages, in orderthat we get paid for all this. Some of the links are obtained bydownloading and processing a 6-gig CSV data file which may or may nothave changed, which gets updated once a week but which the providercontractually requires you download daily, via FTP, to an FTP serverwhich only serves files which do not appear in directory listings andnot those files which *do* appear in their directory listings, ensuringyou cannot look at the timestamp or size on the file to see if it haschanged. (And if you try to download the file that *does* appear in thedirectory listing, they lock out your account, for reasons stillunknown.) Since you have to download the entire file every time,remember to spend the 3 hours of compute time (after the 5 hours ofdownloading) to figure out which rows have changed, without actuallyimpacting the processing of the system, said system which hasn't beenbuilt yet, so you don't know the performance of. (Note, incidentally,that 6G is a fairly small lump of data for this system. I not uncommonlywind up with individual files I process that take >30 seconds to delete.)

Other affiliates are equally f'ed in other undocumented ways, but eachof them is the only company in the world that provides the data they do.By the way, which affiliates we present to the customer depends on thebrowser and ISP the customer is using, and sometimes the physicallocation of the customer. We don't have the data that does thatmatching, but we might be able to get it from someone sometimes, andperhaps volunteers will keep it up to date for us. Or we can guess thatthe customer is probably in the same city as the data he most recentlymatched against, maybe, if we can tell what that is. Unless he nevertalked to us before.

We use multiple mechanisms to reach out to customers, and the mechanismthat works is dependent on the service provider of the customer. Wedon't always know the service provider (as some customers we talk tobefore they ever try to contact us), and sometimes when they connect in,our provider will lie about what service provider the customer is usingif they recently changed service providers. We also don't know whichmechanism works on which service provider, because the service providersactively try to keep you from reaching your customers, and the variousmechanisms try to work around the blocks in various ways.

Make sure, by the way, that you spec out for outsourcing exactly how youimplement the processing for each affiliate, given that the system willbe live with customers before the contracts with the affiliates aresigned, meaning the affiliates won't disclose their super-secret APIs.

Oh, and S also provides data, indexed with yet another non-authoritativestring. The strings used for indexing are in a variety of charactersets, but S doesn't bother to actually make note of which records usewhich character sets, so pretty much anything but ASCII is f'ed anyway.Hope that something doesn't come in corrupting your database. Assumingyou can get updates from them, which randomly stop working until youwork your way up the management ladder to where they finally admit thatsince you prepaid them a large lump sum, maybe they should assign aprogrammer to figuring out why their services randomly stop working fora day or two at a time. Make sure you remember to spec that your updateshave to work in spite of that.

And S, upon hearing you've finished coding to the APIs they provided acouple months ago, decide that the person who provided the APIs shouldhave provided different APIs, and insist you rework the message flowthrough the entire system, because they're contractually obligated toknow what happened with the customer asking for access to their databaseand be able to audit and prove the customer did or did not get certainmessages from our system. Which they didn't mention until that part ofthe system was already written, tested, and in production.

By the way, we don't know how long it takes for the German software todo any processing. Nor do we know how timely M's data is. But we need toprovide the answer to the user within a minute, or it's not a viablebusiness model. Does that make a difference? How many computers will weneed? Because we need to get rack space and power two months in advance.

Now, do me a favor, and write down a complete and accurate spec for howto build that system, so I can outsource it. Oh, and let me know howlong it'll take before you start coding. And how many computers,because we'll get the funding to actually sign contracts with all theseentities once we know how much it'll cost to build. We'll schedule ournationwide advertising blitz to coincide with your estimates. Thanks!


Note that I'm not exaggerating any of this.

Note that this is also not a cut at my bosses or anything like that.It's just a complicated system being put together with a variety ofconstraints that are difficult to work around. The bosses are great, andvery understanding, and doing an outstanding job with their part of theresponsibilities for making this a success.


*** *** ***

When you have a nice self-contained build-from-scratch system, it's nottoo hard to come up with a spec, then implement to it. That's what I doall the time, in that situation. One might even believe outsourcing insuch a situation might be worthwhile, if you could figure out whichoutsourcing companies were reliable enough to actually implement whatyou spec'ed.

When you have a full-blown system whose value is in putting together abunch of pieces that nobody ever used that way before, it's more than abit harder.


--
  Darren New / San Diego, CA, USA (PST)
    It's not feature creep if you put it
    at the end and adjust the release date.

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Re: Where are the software engineers of tomorrow?

Reply via email to