Battling Bugs: A Digital Quagmire By Simson Garfinkel Story location: http://www.wired.com/news/technology/bugs/0,2924,69369,00.html
02:00 AM Nov. 09, 2005 PT In 1976, computer pioneer Edsger W. Dijstra made an observation that would prove uncanny: "Program testing can be quite effective for showing the presence of bugs," he wrote in an essay, "but is hopelessly inadequate for showing their absence." Thirty tears later, Dijsta's words have the ring of prophecy. Companies like Microsoft and Oracle, along with open-source projects like Mozilla and Linux, have all instituted rigorous and extensive testing programs, but bugs just keep slipping through. Last month, Microsoft's monthly drop of bug patches included fixes for 14 security holes that escaped prerelease testing, four of them rated "critical." On Tuesday the company fixed three more Windows bugs, and all three were the same basic genus of bug -- the "buffer overflow" -- that helped spread the first internet worm in 1988. It seems programmers and software architects manage to make the same mistakes generation after generation. Even back in 1988, many of the bugs that haunt us today were already old hat. "We solved buffer overflows and the Y2K problem with Multics in 1975," says Peter Neumann, a senior scientist at SRI International who has been researching bugs and their impact on society for more than two decades. But while Multics -- the first secure multi-user operating system -- addressed some thorny problems, bug history keeps repeating itself. The reasons for that are both simple and complex, experts say, having to do with the programming languages themselves or with programmer psychology and the environment in which software is developed. To understand why bugs occur, it helps to start by looking at the general classes of faulty code. Bugs can be broadly divided into two categories. Typographical bugs and errors in reasoning are one type. Then, there are the deep, conceptual bugs that make a program malfunction even though all the code is more or less correct. Memory misdeeds Buffer overflows and race conditions are examples of the first kind of bug. A particularly tenacious beast, the potential for a buffer overflow is created when a programmer allocates a certain amount of memory to hold a piece of information -- for example, nine characters to hold a Social Security number. But then the program tries to store more data in that space when it actually runs. The rest of the data overflows the pre-allocated buffer and overwrites something else in the computer's memory --- frequently with disastrous results. The 1990s saw buffer overflows reach near-epidemic proportions in programs written in the C and C++ programming languages, because these languages require coders to manually manage the memory used by their programs. Like driving a performance car, control of the memory might let a skilled programmer eke a bit more out of the computer, or accomplish neat tricks and stunts. But the danger of a stall or a crash is ever present. Buffer overflows are particularly disconcerting, say experts in computer security, because many of them can be gimmicked by a skillful attacker into executing arbitrary code supplied from outside the original program. This is frequently called an "exploit." "Everyone has read about the Titan Rain group that is running around," says Jack Danahy, president and CEO of Ounce Labs. Disclosed in a Time magazine report, Titan Rain is the name federal investigators gave an ongoing series of U.S.-targeted hack attacks that officials traced to computers in China. The attackers reportedly copied sensitive files and software, and though they haven't accessed anything classified, Danahy theorizes they might be able to find exploitable bugs in the software they've downloaded. "Little publicity is given that they stole the flight planning software for the U.S. Army," he says. "The capacity to find something interesting inside that code (might have) a far-reaching impact in terms of how that system is used." 'Type-safe' languages inadequate Partially in reaction to memory errors, other languages such as Java, Python and Perl, incorporated a feature called automatic memory management -- effectively taking some control away from programmers. With these so-called "type-safe" languages, attempting to copy 16 characters of data into a region of memory that can only hold nine might result in having the second region extended, or it might generate a runtime error, but it would never clobber the next item in the computer's memory. But like a game of whack-a-mole, the new languages didn't terminate the bugs -- they just moved the glitches to other parts of the code, says Tom Ball, a scientist at Microsoft Research who studies software reliability. "Type safety doesn't eliminate all problems -- it eliminates one class of errors," says Ball. "It doesn't (for example) ensure that resources like locks are properly used." The Pentium precedent With safer languages failing as a panacea, organizations invariably turn to testing in an effort to flush out the errors before they leave the factory. But if it's tempting to think that testing could find the vast majority of lurking bugs, this belief is itself in error, says MIT professor Daniel Jackson, who researches techniques for using formal methods to prove the correctness of computer programs, and is chairing a National Academies study on software dependability. Intel learned this the hard way in 1993, when a bug in the Pentium microprocessor's floating point unit caused math errors to show up on some calculations, but not others. The company couldn't hope to find the bug by testing alone because there are simply too many floating point numbers to test. Trying to divide every possible number that could be stored in the Pentium's floating-point processor by every other possible number would take longer than the estimated age of the universe. But once a scientist found the bug by accident, the problem was widely publicized, and Intel's consumers wanted their Pentiums replaced with microprocessors that could do floating point math without mistake. "That bug cost Intel $475 million," Jackson says. Ultimately, bugs in today's programs are not the result of too little testing, says Jackson. Instead, they're caused by too much freedom given to programmers. With the freedom to be creative comes the freedom to make mistakes. That's why Jackson and other specialists believe that the secret to having fewer bugs lies in taking much of that freedom away. Conceptual errors Perhaps most frustrating for those dreaming of a bug-free tomorrow is that even if a program is coded perfectly, the software's purposes may be thwarted by another layer of bug altogether -- conceptual bugs. These are most frequently the result of an assumption made within the program that doesn't match external reality. So if you fix the buffer overflow in the program that processes Social Security numbers, you're left with the problem that you're reading Social Security numbers in the first place. "Using Social Security numbers as a unique identifier is a bug," says Peter Wayner, a computer consultant who has written several books about computer security and cryptography. In this case, the gaffe is the assumption that two people won't ever use the same number -- a bug that can confuse records of two people, and is largely responsible for the modern crime of identity theft. Bug eradication: no time soon When all is said and done, the most successful technique for combating software bugs may be to abandon any dream of eliminating them, says Jackson. By way of example, Jackson says that during the summer of 2005, radiotherapy machines at two U.S. hospitals were infected with computer viruses when the Windows-based computers that controlled the machines were connected to the hospital networks. "Why was this small embedded system put on the network?" asks Jackson. The hospitals were trying to directly integrate the machines with the rest of the hospital's data network, but the computers hadn't been patched to resist the latest virus. Here's the catch: The radiotherapy machines probably couldn't be patched, because doing so would have changed the computer's configuration and required the medical software to be recertified. That's because installing a security patch might itself introduce a bug that could make the machine operate unsafely. With so many layers of complexity for bugs to hide in, Jackson says the most successful defensive move is not killing the bugs, but caging them -- by limiting the functionality of computers running safety-critical systems. "If you build a system that is critical, you are going to have to figure out how to prioritize your requirements." In other words, the hospitals shouldn't have put their radiotherapy machines on their networks, because that was a requirement that had not been specified or adequately tested. "Dependability," he says, "comes at a price." End of story You are a subscribed member of the infowarrior list. Visit www.infowarrior.org for list information or to unsubscribe. This message may be redistributed freely in its entirety. Any and all copyrights appearing in list messages are maintained by their respective owners.