On 8/31/2018 3:50 PM, Walter Bright wrote: > https://news.ycombinator.com/item?id=17880722 > > Typical comments: > > "`assertAndContinue` crashes in dev and logs an error and keeps going in > prod. Each time we want to verify a runtime assumption, we decide which > type of assert to use. We prefer `assertAndContinue` (and I push for it > in code review)," > > "Stopping all executing may not be the correct 'safe state' for an > airplane though!" > > "One faction believed you should never intentionally crash the app" > > "One place I worked had a team that was very adamant about not really > having much error checking. Not much of any qc process, either. Wait for > someone to complain about bad data and respond. Honestly, this worked > really well for small, skunkworks type projects that needed to be nimble." > > And on and on. It's unbelievable. The conventional wisdom in software > for how to deal with programming bugs simply does not exist. > > Here's the same topic on Reddit with the same awful ideas: > > https://www.reddit.com/r/programming/comments/9bl72d/assertions_in_production_code/ > > > No wonder that DVD players still hang when you insert a DVD with a > scratch on it, and I've had a lot of DVD and Bluray players over the > last 20 years. No wonder that malware is everywhere.
All too true. A while ago I worked for a large financial company. Many production systems had zero monitoring. A server with networking issues could continue to misbehave _for hours_ until someone somewhere noticed thousands of error messages and manually intervened. There were also very few data quality checks. Databases could have duplicate records, missing records or obviously inconsistent information. Most systems just continued to process corrupt data as if nothing happened, propagating it further and further. Some crucial infrastructure had no usable data backups. With all this in mind, you would be surprised to hear how much they talked about "software quality". It's just that their notion of quality revolved around having no bugs ever go into production and never bringing down any systems. There were ever increasing requirements around unit test coverage, opinionated coding standards and a lot of paperwork associated with every change. Needless to say, it didn't work very well, and they had round half a dozen outages of varying sizes _every day_. Alan Kay, Joe Armstrong, Jim Coplien - just to name a few famous people who talked about this issue. It's amazing that so many engineers still don't get it. I'm inclined to put some blame on the recent TDD movement. They often to seem stress low-level code perfectionism, while ignoring high-level architecture and runtime resilience (in other words, system thinking).