These are definitely important considerations Bruce
and I am going to check out the papers referenced herein.

Thanks,
Chris


-----Original Message-----
From: Bruce Barkstrom <brbarkst...@gmail.com>
Reply-To: <dev@oodt.apache.org>
Date: Tuesday, September 23, 2014 6:16 AM
To: <dev@oodt.apache.org>, <alek...@gmail.com>
Subject: Some Further Thoughts on Configuration Management of Complex
Workflows

>While I won't claim to have done a thorough examination of the proposal
>to use the IBM tool for developing workflows, I am concerned about several
>items relating to configuration management.  There are several articles
>in the new Comm. ACM that bear on security and configuration management.
>(CACM, Vol. 57, No. 9, 2014).  I'd highly recommend getting a copy and
>taking a look at the articles in the middle of the issue.
>
>1.  Kern, C., 2014: Securing the Tangled Web, CACM, 57, 38-67 presents
>a view of security issues due to script injection vulnerabilities that
>makes
>JSON and other technologies that use Java Script less secure than one
>would like.  Kern is an information security engineer for Google.  He
>discusses
>not only the nature of the XSS vulnerabilities, but also work Google has
>undertaken
>to reduce their risk.  These include building in special character
>exception handling,
>designing and testing automated templates for interface designers to use,
>and
>project management enforcement of strict disciplines that forbid use of
>vulnerable software.  Unfortunately, the cures add to the leaning curve
>for
>using these tools - and increase the maintenance cost of software because
>they need to be applied "forever".
>
>2.  Workflows (or, in normal project management nomenclature, Work
>Breakdown
>Structures) are graphs whose complexity increases markedly as more
>activities
>and objects get included.   If one is aiming for high integrity or fully
>replicable and
>transparent software systems, one must maintain the ability to retain
>configuration.
>The old NCAR FORTRAN manuals (ca. 1980) had a cover that embedded the
>notion "It ran yesterday.  It's been running for years.  I only changed
>one
>card."
>This means that software that is updated (by revisions due to concerns
>over
>security or to make other improvements) could require verification that
>the
>updates haven't changed numerical values.  Based on my personal experience
>with Ubuntu Linux (or Windows - whatever), updates occur on at least a
>weekly
>basis, with the organizations responsible for their software deciding when
>to
>send out updates.  This rate of update makes the Web a pretty volatile
>environment.
>In most organizations that have system administrators, they bear the
>burden
>this
>turmoil creates.  End users may not realize the impact, but it costs time
>and attention
>to avoid being overwhelmed.
>
>3.  In many of the software packages we use, the organization providing
>the
>software
>manages package updates with a centralized package manager.  In Linux,
>Debian (and
>the derivative Ubuntu family of software) uses one centralized manager to
>produce the
>.deb packages that contain appropriate provenance metadata for maintaining
>configuration.
>Red Hat and SuSE Linux use an alternative format for the RPM package with
>its metadata
>format.  These package managers do not operate in the same way.  For
>example, if
>you want to ingest RPM packages into Ubuntu, you have to install a package
>called
>alien and use that to convert the RPM to .deb formats.  The same
>pleasantries affect
>Java, databases, and Web standards.  Because some of these organizations
>are real
>commercial enterprises making their money from customers outside of the
>federal
>contracting venue, it seems unlikely that expecting funding agencies will
>develop
>one common standard for configuration management.  While funding agencies
>might
>think a single standard for configuration would solve their problems, that
>would require
>an unprecedented degree of cooperation between agencies, data producers,
>and data
>users.  The time scale for reaching agreements on this kind of "social
>engineering"
>is almost certainly at least a decade, during which the technological
>basis
>in hardware
>and software will have evolved out from under the agencies.
>
>I suspect that the security issues relating to JSON and such are the
>immediate
>concern.  On a slightly longer time frame, it's important to remember the
>complexity
>of workflow scaling makes a single tool unlikely.  A solution for data
>production with
>short chains of objects that are relatively isolated (single investigator
>conducting a
>few investigations per year) is vastly different from production flows
>such
>as weather
>forecasting or some kinds of climate data production (large teams of
>software developers and
>scientists (100's of people) running 1000's of jobs per day).
>Configuration management
>for the latter kinds of project requires building group cultures that
>recognize the importance
>of managing the configuration - and does take up a lot of time - even for
>the scientists
>involved.
>
>I won't say I'm sorry for the length of the comments.  Some issues can't
>be
>reduced
>to sound bites or bullets.  The chain of reasoning for these issues seems
>longer.
>
>Bruce B.


Reply via email to