Bob La Quey wrote:
Assuming you (a) have a customer who knows what they want,
Of course you do not have a customer who knows what they want.
That is one reason why definition is hard and requires a lot
of face to face.
Oh, just to clarify the sort of thing I meant when I said the "customer"
might not know what they want, and if they do they might not be able to
tell you:
In my current job, the original spec went something like this. (Mildly
vague because it's still a couple weeks before we're live and public.)
We're going to start out in 3 cities, and maybe eventually 300 cities or
so, but maybe not. In each city, we're going to have somewhere between 5
and 150 high-speed data streams, and an equal number of low-speed data
streams. The data streams come in over a custom PCI card, which isn't
built yet, but we're pretty sure it'll have drivers available for at
least one of the operating systems we know runs on the used hardware
we'll be buying once all the deals are put together. The high-speed data
streams need to be processed by a piece of software from a German
company we're still negotiating with, so we don't know if it's a
library, a server, or what. The low-speed data streams each have a
format which must be reverse-engineered from samples - i.e., you must
guess what it is. In addition, it'll change at times, and sometimes the
people providing the stream will simply change it for their amusement,
inserting random commentary in the middle of the data.
The library that processes the streams comes from a company full of
physicists, so they don't understand that a SIGSEGV is almost always the
fault of the C++ code they wrote; instead, they'll insist it must be
because we have multiple Linux kernels installed on the disk, even tho
the software merely does complex math on data files. There's no logging
in their server, so they can't track down their crashes, so they simply
cure the problem by ignoring all signals, conveniently making it
impossible to cleanly shut down their server once it starts. And it
takes several minutes to come back up (which of course one couldn't tell
before getting the software and writing enough of the system to fill it
up). So we'll want to run two of each server, and insert records into
each, except that their client library, while asking you to identify
which server you want to use each time, ignores it every time except the
first time. They have no version control, so if you find a bug in a
month-old version you've tuned your software to work well with, they
cannot give you the same version with the bug fixed, and you have to
take the latest version, which often has a completely different set of
client interfaces. They take a variety of arguments on the command line,
but fault out if any don't match what they recommended, so any change in
what you want to do with their server requires a round-trip engineering
change, recompilation in Germany, upgrading to the new version of the
client APIs, etc. They are, unfortunately, the only company in the world
that does what they do.
Then the customers connect in over one network owned by S, delivering a
large bolus of data each time. The company providing that network (S)
has insufficient engineering resources, so they simply ignore bug
reports, refuse to make any sorts of changes you've already paid for,
and in general do things like require invalid XML as the interface into
their system and use SOAP libraries that are specifically coded to only
talk to the same SOAP library, rather than actually following the
standard. S describes the data they will provide for us from each
customer. They lie, and admit they are unable to provide any of the
metadata they promised, after the system is up and running enough to
accept their data and we realize they're providing all zeros in those
fields, making us have to replicate the work for each customer at each
city (3, or 300) instead of doing it only in the city where the customer
is. They are, unfortunately, the only company in the world that does
what they do.
The boss wants to support 10 customer requests per second, which you can
fit over a T1, so you design the system around that, until three months
later the boss wants to support 400 customer requests per second because
he made a deal with someone to get the funding you need to build it in
the first place. This requires moving parts of the system next to the
NOC of the "S network" from the previous paragraph.
The customer requests get matched against the data streams from the
cities, which might require sending the request out to the city or might
require piping all the city streams back to the customer-serving
computer, depending on (1) how much the German software compresses data,
(2) how many customers are calling in, and (3) how many cities have your
custom PCI cards in them.
We also have a data stream from company M, which altho in XML, company M
is unable to document what any of the attributes or tags actually mean,
beyond that implied by the names of the tags. XML in this case is simply
used as a mechanism to avoid having to document how to read the data.
The library to access it is opaque. It also requires providing a buffer
size, but M doesn't put any upper limit on the size of the buffer. The
client library does not release the memory it uses, so you have to run
for a bit, exit, and restart. About five times a day, one of the records
will be corrupt and unparsable XML. The records include timestamps but
no timezone data. Occasionally, the connection to company M will get
randomly deconfigured, and your connection to M will be off the air for
a while while they get around to resetting it, not that this was part of
M's documentation of course. Company M is, unfortunately, the only
company in the world that does what they do.
We also get some *third* types of low-speed data streams from various
places in various cities. Those streams are also fairly willy-nilly, in
that while the data is the best available, none of it is actually
authoritative. Like on this particular stream, between 6AM and 9AM
weekdays, the data should be ignored and company M's data should be used
instead, except when the data starts with a Z. Others will be like that
too, but we don't know which until we get the streams in and see. The
data is always timely, unless it's not, but you can't tell that. And the
actual content of the data differs from that provided by M, so you can't
really match it up in any good way. Usually. Oh, and each of these
streams is a different partner with specific deals and advertising
models and so on, so they'll want to be able to customize things
throughout the system when it's their stream we're using, but we don't
know which kinds of things. We'll let you know what we need to do after
we sign the contracts promising to do it. By the way, make sure the
software that's accepting the connections every few seconds for these
low-speed data streams never breaks, because it *takes out your
partner's servers* if they can't connect to you after they've configured
that connection, and your partner goes off the air until you come back
online. Not that you know this until a power failure dies in such a way
as to trip the breaker on the rack's UPS and powers down all your
servers. Remember to include that in your spec.
We match up the real-time customer data coming in from S's network to
the real-time data coming in on the high-speed data streams, correlate
with the somewhat-real-time low-speed data streams, and generate a web
page with the results. The contents of the web page have to be approved
by all of the other customers of S, in advance, with whatever changes
they want, while live and available to the public, before S will allow
us to use S's network, in spite of the fact that S's network has nothing
to do with the Internet, doesn't carry internet traffic, and S's other
customers don't carry internet traffic.
Whether the matching of data occurs in the individual cities or in the
central servers depends on whether it's less bandwidth to ship the
customer data to the cities or the city data to the customer server. Of
course, since we don't know how much data the German software generates
nor how many customers we'll be processing within a couple orders of
magnitude, we'll have to decide that a few months from now. Make sure
you order the right number of T1's, which also take a month or two to
install. We'll let you know where we're putting the servers after we
sign the leases, at which point we'll be able to tell you how many
high-speed data streams are available at that location.
By the way, the matching is also heuristic. About 40% of the time,
you'll get 10 different matches, all with zero confidence. About 10% of
the time, you'll get multiple 100% confidence matches of conflicting
information. Make sure to figure out in advance of outsourcing what
algorithm will give you good results in spite of such results, which you
didn't know you'd have, and which the Germans were surprised you got,
and more surprised you figured out how to work around.
Oh, by the way, the web-services database you're contractually obligated
to fall back to when you don't get any matches at all? It's mostly
answers your customers are sure not to be interested in, so that'll give
ten bogus matches of zero confidence also, except that about 30% of the
time the right match is in the list one or more times. So by the way,
three weeks before launch, we decide we need to build our own database
to match against too. In spite of not having any actual authoritative
data to build it with. Do take care of that, won't you? Make sure you
have at least two weeks of data in it when we launch. Note that we don't
have any machines to actually run this database on, as they're all
repurposed to things like the backup servers for when the German
software crashes and we have to use the hot spare while the primary
comes back up. All that's left are four machines which don't see the
custom cards, two with disks too small to hold an hour worth of data,
and two with huge disks but (for yet-unknown reasons) will only run one
disk at about 5% of the speed it's capable of, in spite of buying a new
controller card and new disks.
Then we also offer links to "affiliates" on those web pages, in order
that we get paid for all this. Some of the links are obtained by
downloading and processing a 6-gig CSV data file which may or may not
have changed, which gets updated once a week but which the provider
contractually requires you download daily, via FTP, to an FTP server
which only serves files which do not appear in directory listings and
not those files which *do* appear in their directory listings, ensuring
you cannot look at the timestamp or size on the file to see if it has
changed. (And if you try to download the file that *does* appear in the
directory listing, they lock out your account, for reasons still
unknown.) Since you have to download the entire file every time,
remember to spend the 3 hours of compute time (after the 5 hours of
downloading) to figure out which rows have changed, without actually
impacting the processing of the system, said system which hasn't been
built yet, so you don't know the performance of. (Note, incidentally,
that 6G is a fairly small lump of data for this system. I not uncommonly
wind up with individual files I process that take >30 seconds to delete.)
Other affiliates are equally f'ed in other undocumented ways, but each
of them is the only company in the world that provides the data they do.
By the way, which affiliates we present to the customer depends on the
browser and ISP the customer is using, and sometimes the physical
location of the customer. We don't have the data that does that
matching, but we might be able to get it from someone sometimes, and
perhaps volunteers will keep it up to date for us. Or we can guess that
the customer is probably in the same city as the data he most recently
matched against, maybe, if we can tell what that is. Unless he never
talked to us before.
We use multiple mechanisms to reach out to customers, and the mechanism
that works is dependent on the service provider of the customer. We
don't always know the service provider (as some customers we talk to
before they ever try to contact us), and sometimes when they connect in,
our provider will lie about what service provider the customer is using
if they recently changed service providers. We also don't know which
mechanism works on which service provider, because the service providers
actively try to keep you from reaching your customers, and the various
mechanisms try to work around the blocks in various ways.
Make sure, by the way, that you spec out for outsourcing exactly how you
implement the processing for each affiliate, given that the system will
be live with customers before the contracts with the affiliates are
signed, meaning the affiliates won't disclose their super-secret APIs.
Oh, and S also provides data, indexed with yet another non-authoritative
string. The strings used for indexing are in a variety of character
sets, but S doesn't bother to actually make note of which records use
which character sets, so pretty much anything but ASCII is f'ed anyway.
Hope that something doesn't come in corrupting your database. Assuming
you can get updates from them, which randomly stop working until you
work your way up the management ladder to where they finally admit that
since you prepaid them a large lump sum, maybe they should assign a
programmer to figuring out why their services randomly stop working for
a day or two at a time. Make sure you remember to spec that your updates
have to work in spite of that.
And S, upon hearing you've finished coding to the APIs they provided a
couple months ago, decide that the person who provided the APIs should
have provided different APIs, and insist you rework the message flow
through the entire system, because they're contractually obligated to
know what happened with the customer asking for access to their database
and be able to audit and prove the customer did or did not get certain
messages from our system. Which they didn't mention until that part of
the system was already written, tested, and in production.
By the way, we don't know how long it takes for the German software to
do any processing. Nor do we know how timely M's data is. But we need to
provide the answer to the user within a minute, or it's not a viable
business model. Does that make a difference? How many computers will we
need? Because we need to get rack space and power two months in advance.
Now, do me a favor, and write down a complete and accurate spec for how
to build that system, so I can outsource it. Oh, and let me know how
long it'll take before you start coding. And how many computers,
because we'll get the funding to actually sign contracts with all these
entities once we know how much it'll cost to build. We'll schedule our
nationwide advertising blitz to coincide with your estimates. Thanks!
Note that I'm not exaggerating any of this.
Note that this is also not a cut at my bosses or anything like that.
It's just a complicated system being put together with a variety of
constraints that are difficult to work around. The bosses are great, and
very understanding, and doing an outstanding job with their part of the
responsibilities for making this a success.
*** *** ***
When you have a nice self-contained build-from-scratch system, it's not
too hard to come up with a spec, then implement to it. That's what I do
all the time, in that situation. One might even believe outsourcing in
such a situation might be worthwhile, if you could figure out which
outsourcing companies were reliable enough to actually implement what
you spec'ed.
When you have a full-blown system whose value is in putting together a
bunch of pieces that nobody ever used that way before, it's more than a
bit harder.
--
Darren New / San Diego, CA, USA (PST)
It's not feature creep if you put it
at the end and adjust the release date.
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg