Battling Bugs: A Digital Quagmire
By Simson Garfinkel

Story location: 
http://www.wired.com/news/technology/bugs/0,2924,69369,00.html

02:00 AM Nov. 09, 2005 PT

In 1976, computer pioneer Edsger W. Dijstra made an observation that would
prove uncanny: "Program testing can be quite effective for showing the
presence of bugs," he wrote in an essay, "but is hopelessly inadequate for
showing their absence."

Thirty tears later, Dijsta's words have the ring of prophecy. Companies like
Microsoft and Oracle, along with open-source projects like Mozilla and
Linux, have all instituted rigorous and extensive testing programs, but bugs
just keep slipping through. Last month, Microsoft's monthly drop of bug
patches included fixes for 14 security holes that escaped prerelease
testing, four of them rated "critical."

On Tuesday the company fixed three more Windows bugs, and all three were the
same basic genus of bug -- the "buffer overflow" -- that helped spread the
first internet worm in 1988. It seems programmers and software architects
manage to make the same mistakes generation after generation. Even back in
1988, many of the bugs that haunt us today were already old hat.

"We solved buffer overflows and the Y2K problem with Multics in 1975," says
Peter Neumann, a senior scientist at SRI International who has been
researching bugs and their impact on society for more than two decades. But
while Multics -- the first secure multi-user operating system -- addressed
some thorny problems, bug history keeps repeating itself.

The reasons for that are both simple and complex, experts say, having to do
with the programming languages themselves or with programmer psychology and
the environment in which software is developed. To understand why bugs
occur, it helps to start by looking at the general classes of faulty code.

Bugs can be broadly divided into two categories. Typographical bugs and
errors in reasoning are one type. Then, there are the deep, conceptual bugs
that make a program malfunction even though all the code is more or less
correct.

Memory misdeeds

Buffer overflows and race conditions are examples of the first kind of bug.
A particularly tenacious beast, the potential for a buffer overflow is
created when a programmer allocates a certain amount of memory to hold a
piece of information -- for example, nine characters to hold a Social
Security number. But then the program tries to store more data in that space
when it actually runs. The rest of the data overflows the pre-allocated
buffer and overwrites something else in the computer's memory --- frequently
with disastrous results.

The 1990s saw buffer overflows reach near-epidemic proportions in programs
written in the C and C++ programming languages, because these languages
require coders to manually manage the memory used by their programs. Like
driving a performance car, control of the memory might let a skilled
programmer eke a bit more out of the computer, or accomplish neat tricks and
stunts. But the danger of a stall or a crash is ever present.

Buffer overflows are particularly disconcerting, say experts in computer
security, because many of them can be gimmicked by a skillful attacker into
executing arbitrary code supplied from outside the original program. This is
frequently called an "exploit."

"Everyone has read about the Titan Rain group that is running around," says
Jack Danahy, president and CEO of Ounce Labs. Disclosed in a Time magazine
report, Titan Rain is the name federal investigators gave an ongoing series
of U.S.-targeted hack attacks that officials traced to computers in China.
The attackers reportedly copied sensitive files and software, and though
they haven't accessed anything classified, Danahy theorizes they might be
able to find exploitable bugs in the software they've downloaded. "Little
publicity is given that they stole the flight planning software for the U.S.
Army," he says. "The capacity to find something interesting inside that code
(might have) a far-reaching impact in terms of how that system is used."

'Type-safe' languages inadequate

Partially in reaction to memory errors, other languages such as Java, Python
and Perl, incorporated a feature called automatic memory management --
effectively taking some control away from programmers. With these so-called
"type-safe" languages, attempting to copy 16 characters of data into a
region of memory that can only hold nine might result in having the second
region extended, or it might generate a runtime error, but it would never
clobber the next item in the computer's memory.

But like a game of whack-a-mole, the new languages didn't terminate the bugs
-- they just moved the glitches to other parts of the code, says Tom Ball, a
scientist at Microsoft Research who studies software reliability. "Type
safety doesn't eliminate all problems -- it eliminates one class of errors,"
says Ball. "It doesn't (for example) ensure that resources like locks are
properly used."

The Pentium precedent

With safer languages failing as a panacea, organizations invariably turn to
testing in an effort to flush out the errors before they leave the factory.
But if it's tempting to think that testing could find the vast majority of
lurking bugs, this belief is itself in error, says MIT professor Daniel
Jackson, who researches techniques for using formal methods to prove the
correctness of computer programs, and is chairing a National Academies study
on software dependability.

Intel learned this the hard way in 1993, when a bug in the Pentium
microprocessor's floating point unit caused math errors to show up on some
calculations, but not others. The company couldn't hope to find the bug by
testing alone because there are simply too many floating point numbers to
test. Trying to divide every possible number that could be stored in the
Pentium's floating-point processor by every other possible number would take
longer than the estimated age of the universe. But once a scientist found
the bug by accident, the problem was widely publicized, and Intel's
consumers wanted their Pentiums replaced with microprocessors that could do
floating point math without mistake. "That bug cost Intel $475 million,"
Jackson says.

Ultimately, bugs in today's programs are not the result of too little
testing, says Jackson. Instead, they're caused by too much freedom given to
programmers. With the freedom to be creative comes the freedom to make
mistakes. That's why Jackson and other specialists believe that the secret
to having fewer bugs lies in taking much of that freedom away.

Conceptual errors

Perhaps most frustrating for those dreaming of a bug-free tomorrow is that
even if a program is coded perfectly, the software's purposes may be
thwarted by another layer of bug altogether -- conceptual bugs. These are
most frequently the result of an assumption made within the program that
doesn't match external reality.

So if you fix the buffer overflow in the program that processes Social
Security numbers, you're left with the problem that you're reading Social
Security numbers in the first place.

"Using Social Security numbers as a unique identifier is a bug," says Peter
Wayner, a computer consultant who has written several books about computer
security and cryptography. In this case, the gaffe is the assumption that
two people won't ever use the same number -- a bug that can confuse records
of two people, and is largely responsible for the modern crime of identity
theft.

Bug eradication: no time soon

When all is said and done, the most successful technique for combating
software bugs may be to abandon any dream of eliminating them, says Jackson.

By way of example, Jackson says that during the summer of 2005, radiotherapy
machines at two U.S. hospitals were infected with computer viruses when the
Windows-based computers that controlled the machines were connected to the
hospital networks.

"Why was this small embedded system put on the network?" asks Jackson. The
hospitals were trying to directly integrate the machines with the rest of
the hospital's data network, but the computers hadn't been patched to resist
the latest virus.

Here's the catch: The radiotherapy machines probably couldn't be patched,
because doing so would have changed the computer's configuration and
required the medical software to be recertified. That's because installing a
security patch might itself introduce a bug that could make the machine
operate unsafely.

With so many layers of complexity for bugs to hide in, Jackson says the most
successful defensive move is not killing the bugs, but caging them -- by
limiting the functionality of computers running safety-critical systems.

"If you build a system that is critical, you are going to have to figure out
how to prioritize your requirements." In other words, the hospitals
shouldn't have put their radiotherapy machines on their networks, because
that was a requirement that had not been specified or adequately tested.

"Dependability," he says, "comes at a price."

End of story



You are a subscribed member of the infowarrior list. Visit 
www.infowarrior.org for list information or to unsubscribe. This message 
may be redistributed freely in its entirety. Any and all copyrights 
appearing in list messages are maintained by their respective owners.

Reply via email to