These are definitely important considerations Bruce and I am going to check out the papers referenced herein.
Thanks, Chris -----Original Message----- From: Bruce Barkstrom <brbarkst...@gmail.com> Reply-To: <dev@oodt.apache.org> Date: Tuesday, September 23, 2014 6:16 AM To: <dev@oodt.apache.org>, <alek...@gmail.com> Subject: Some Further Thoughts on Configuration Management of Complex Workflows >While I won't claim to have done a thorough examination of the proposal >to use the IBM tool for developing workflows, I am concerned about several >items relating to configuration management. There are several articles >in the new Comm. ACM that bear on security and configuration management. >(CACM, Vol. 57, No. 9, 2014). I'd highly recommend getting a copy and >taking a look at the articles in the middle of the issue. > >1. Kern, C., 2014: Securing the Tangled Web, CACM, 57, 38-67 presents >a view of security issues due to script injection vulnerabilities that >makes >JSON and other technologies that use Java Script less secure than one >would like. Kern is an information security engineer for Google. He >discusses >not only the nature of the XSS vulnerabilities, but also work Google has >undertaken >to reduce their risk. These include building in special character >exception handling, >designing and testing automated templates for interface designers to use, >and >project management enforcement of strict disciplines that forbid use of >vulnerable software. Unfortunately, the cures add to the leaning curve >for >using these tools - and increase the maintenance cost of software because >they need to be applied "forever". > >2. Workflows (or, in normal project management nomenclature, Work >Breakdown >Structures) are graphs whose complexity increases markedly as more >activities >and objects get included. If one is aiming for high integrity or fully >replicable and >transparent software systems, one must maintain the ability to retain >configuration. >The old NCAR FORTRAN manuals (ca. 1980) had a cover that embedded the >notion "It ran yesterday. It's been running for years. I only changed >one >card." >This means that software that is updated (by revisions due to concerns >over >security or to make other improvements) could require verification that >the >updates haven't changed numerical values. Based on my personal experience >with Ubuntu Linux (or Windows - whatever), updates occur on at least a >weekly >basis, with the organizations responsible for their software deciding when >to >send out updates. This rate of update makes the Web a pretty volatile >environment. >In most organizations that have system administrators, they bear the >burden >this >turmoil creates. End users may not realize the impact, but it costs time >and attention >to avoid being overwhelmed. > >3. In many of the software packages we use, the organization providing >the >software >manages package updates with a centralized package manager. In Linux, >Debian (and >the derivative Ubuntu family of software) uses one centralized manager to >produce the >.deb packages that contain appropriate provenance metadata for maintaining >configuration. >Red Hat and SuSE Linux use an alternative format for the RPM package with >its metadata >format. These package managers do not operate in the same way. For >example, if >you want to ingest RPM packages into Ubuntu, you have to install a package >called >alien and use that to convert the RPM to .deb formats. The same >pleasantries affect >Java, databases, and Web standards. Because some of these organizations >are real >commercial enterprises making their money from customers outside of the >federal >contracting venue, it seems unlikely that expecting funding agencies will >develop >one common standard for configuration management. While funding agencies >might >think a single standard for configuration would solve their problems, that >would require >an unprecedented degree of cooperation between agencies, data producers, >and data >users. The time scale for reaching agreements on this kind of "social >engineering" >is almost certainly at least a decade, during which the technological >basis >in hardware >and software will have evolved out from under the agencies. > >I suspect that the security issues relating to JSON and such are the >immediate >concern. On a slightly longer time frame, it's important to remember the >complexity >of workflow scaling makes a single tool unlikely. A solution for data >production with >short chains of objects that are relatively isolated (single investigator >conducting a >few investigations per year) is vastly different from production flows >such >as weather >forecasting or some kinds of climate data production (large teams of >software developers and >scientists (100's of people) running 1000's of jobs per day). >Configuration management >for the latter kinds of project requires building group cultures that >recognize the importance >of managing the configuration - and does take up a lot of time - even for >the scientists >involved. > >I won't say I'm sorry for the length of the comments. Some issues can't >be >reduced >to sound bites or bullets. The chain of reasoning for these issues seems >longer. > >Bruce B.