Well, given that I work for a financial institution, I can say that in many cases "stopping everything" is exactly what DOES happen. "Charging ahead", knowing you're dealing with potentially corrupted data and not knowing the extent of the problem, is irresponsible.
Aside from the financial implications, we have responsibilities to the customers and regulatory agencies. Fortunately, situations like this are very rare, and the folks here are VERY good at fixing things quickly. But 0C7's in batch jobs do happen, and they still get fixed manually. For the REALLY critical stuff, parallel redundant systems are used (Tandem, etc.), on the theory that a single failure can't knock down more than part of the application. In a previous job I worked for a hospital. Most of the systems we managed were NOT involved in direct patient care, and it's a good thing. When we DID start getting involved in that area, it became VERY scary. > > This is good *IF* it is not a critical system. If the application is > moving billions of financial transactions around the world > and it costs > brokers millions of dollars for every minute of down time just "stop > everything, and let someone fix it" is not a good answer. > The application > needs to identify the failure point, establish what is likely > good or bad > data and charge ahead. (After leaving a solid trail of bread > crumbs for > someone to follow....) > > - Dale >