James Carlson wrote:
> Roland Mainz writes:
> > > I don't believe that we really need to keep around a museum piece like
> > > this.  /bin/oksh would be just a relic, entirely superseded by ksh93
> > > installed as /bin/ksh.  For what reason would anyone want to reminisce
> > > with the old ksh implementation?
> >
> > As described in the other postings: We think this is needed as a "safety
> > net".
> 
> In that case, I think the materials presented for ARC review should
> describe this issue in a great amount of detail.  Introducing such a
> thing is non-trivial.
> 
> The specific items I'd want to see in that review are:
> 
>   - A complete list of the known compatibility issues.  (I've seen a
>     couple of lists, but don't think I've seen a complete one yet.)

There IMO a catch: It is likely impossible to geneate a list which is
really complete. ksh88 differs from ksh93 in fundamental ways in both
implementation and design so there is no easy way to - for example -
compare the parser code and write up the differences. And this situation
becomes even more compliciated with the Solaris version of /usr/bin/ksh
which evolved seperately from ksh88 (it may be the shell-evolutionary
equivalent of comparing Homo Sapiens Sapiens with Pan paniscus
(=Bonobo). Long ago they had a common ancestor but they evolved in
totally different directions and speed (due different environments)).

>   - Some discussion of why the incompatibilities are strictly
>     _required_, and aren't just an artifact of the implementation.
>     (I.e., "doing X means that Y must fail by design, and Y is a more
>     important feature" is a good sort of answer, but "maintaining X
>     was just too hard" likely is not.)

That sounds like a good idea. For some issues like the "dynamic vs.
static scoping" it's easy, other stuff like the removal of ERRNO may
require more discussion (OK... even for ERRNO it's easy since it was
clearly marked as for debugging-purposes only and totally useless for
normal scipting).

>   - Some indication of what particular software is likely to fail.
>     Have we looked at any third party scripting to see if failures
>     will affect any significant user population?  I'm quite willing to
>     write off "probably never used" features, but less so for things
>     that we can easily predict will cause widespread problems.

That is the tricky part and the reason why I wrote the patch for "oksh".
Before we can actually do testing with the community and software
vendors we need a way that people can replace /usr/bin/ksh with ksh93
_safely_. Right now this does not work as this blows-up inetd (via
|libc::wordexp()| badly, rendering such systems useless in many cases.
And that's why we need this patch for Solaris Nevada: Eithout it we
can't get real-world results. Results from lab environments are good but
the results from real-world systems are more important.

>   - A description of how the "safety net" might plausibly be used.
>     I'm not talking about trivial cases here (such as old hacks the
>     user might have in $HOME/bin), but rather the hard ones, such as
>     third party code.  Just what does a user do?

See new thread ""ksh-version-switch"-script ?" ...

>   - Any thoughts about how the problems could be avoided would be
>     welcome.  Have we talked to any third-party software vendors to
>     get them to migrate?  Are there any validation tools ("appcert for
>     shell scripts?") that could be applied? 

Uhm... not really. Some changes can be caught by a test script - but
other things like my favourite example of "dynamic vs. static scoping"
can AFAIK only be tested at runtime... ;-(
The good news is that such obscure features are very rarely used -
digging through our library of scripts shows almost no problems (the two
scripts I found simply had brackets missing which the ksh88 parser did
ignore - ksh93 is much more strict (IMO a good thing)) even with
six-year old scripts... :-)

>     Is there any way that the
>     shell could detect these specific cases and log a warning or error
>     somewhere that might be noticed?  (I'm worried about scripts that
>     are buried several layers deep, behind other scripts and a GUI.
>     Failure there likely means huge debug expenses for users.)
> 
> The underlying issue here is that compatibility isn't just a goal for
> Solaris.  It's a constraint.  As an ARC reviewer, I want to be fully
> convinced that when we break things, it is because we have no
> plausible alternatives, not just because we think we can do it.

Ok... but it depends a little bit on how far a "plausible alternatives"
can be strechted. The idea to customise ksh93 for
backwards-compatibility reason does not sound very nice for me... on the
other hand upstream (AFAIK David Korn himself) AFAIK said he is willing
to accept reasonable patches in this area. It would be nice if such
changes could be done in consens with the ksh93/AST people, e.g.
avoiding that the Solaris ksh93 version starts to differ from the normal
ones should have a very very high priority.

[snip]
> > By design ksh93 is not fully backwards-compatible to get rid of very
> > ugly (design) issues in ksh88. Software-vendors should be encouraged to
> > test and adjust (if neccesary) their scripts (usually this porting is
> > not required if the scripts are running on multiple platforms so only
> > those products are affected which were written specifically for Solaris)
> > but there should be a way to do the adjustment quickly (e.g. via
> > switching from #!/bin/ksh to #!/bin/oksh to give the software developers
> > enougth time to port their products (that's also the reason why I asked
> > in
> 
> This leaves out some significant details:
> 
>   - What does a user see?  How does an end user know that the problem
>     is a third-party application that needs to be updated rather than
>     a bug in Solaris that needs to be fixed?

I have no idea how this could be done since there is AFAIK no way to
differ between binary shipped with Solaris and 3rd-party binary except
looking at the package database.

>   - Just how hard is it to fix these problems "the right way?"

It depends on how the scripts are written... that's why we need to do
the testing in real-world environments to find scripts which are really
broken (and not only the "missing bracket"-issue which causes scripts to
fail because the ksh93 parser is more strict) ...

>     If
>     it's just (say) a week's worth of effort to recode a broken script
>     to work with the new ksh, perhaps there's no point in providing an
>     alternative that could (and probably will) be abused.  (For most
>     vendors, the time required to test and deploy a fix -- even
>     /bin/oksh -- completely dominates the time to fix a problem.  And,
>     like Sun, vendors often stop supporting "old versions," meaning
>     that some users may well just be out of luck.)

Yes... but I do not expect that there are really so many broken scripts
out there... and EOLing "oksh" after three Solaris release cycles gives
developers and users another six or seven years to deal with the
problem... :-)

>   - What demands are placed on an administrator?  Using /bin/oksh
>     means that we'll need patches that introduce this object (or
>     symlink) on old releases, meaning that vendors in this position
>     will be forcing their customers through an ugly must-patch-first
>     (before script is rewritten) to no-patch-needed transition.  These
>     things are often very confusing for users and result in expensive
>     support calls.

Yes, but on the other hand OS patches are required for some
applications. I remeber it very well how bad Mozilla was at this point
since libCrun (C++ runtime)+i18n patches were mandatory (otherwise it
suffered from fun such as silent profile data corruption (and to make it
worse: The i18n patch was neither part of the "Recommended" nor the
"Security" cluster, making it an extraoridinary pain to deal with the
issue)). The solution was to FORCE users to update (take a look at the
following screenshot which shows a dtksh script in action to rescue the
situation: https://bugzilla.mozilla.org/attachment.cgi?id=144824) ...

[snip]
> > > If it's really the case that ksh93 is "risky" to drop into place as
> > > /bin/ksh, and that's what we're trying to mitigate, then we ought to
> > > think long and hard about doing it in the first place.
> >
> > See my comment above. Software which already handles different platforms
> > should have little or no problems with ksh93 - only those scripts which
> > are specifically written (or contain workarounds (and only do platform
> > tests via "uname" or install Solaris-specific scripts on Solaris and a
> > normal script for all other OSes with normal versions of ksh)) for
> > Solaris ksh (which is some very special breed and not even ksh88
> > compatible!) _may_ need adjustments. Again: _MAY_.
> 
> If people are using uname this way today rather than feature tests
> (ugh!) or just avoiding known trouble areas, then that software _will_
> break when ksh93 is integrated as /usr/bin/ksh.

Yes... but should we really include workarounds for such a broken script
(BTW: testing whether the shell is a ksh93 one can easily be done via
probing for ${.sh.version} (e.g. [ -z "$((echo ${.sh.version})
2>/dev/null)" ] etc. )) ?

> > The full impact of the change can really only be measued if we give the
> > OpenSolaris community a OS/Net vesion where they can safely put ksh93
> > into /bin/ksh and try what happens then (this has already been tried in
> > a smaller scale with AFAIK two of the OpenSolaris distributions with
> > good results and very good feedback (except the inetd hickup which is
> > simply a direct results of the |libc::wordexp()| problem)).
> 
> I don't see the point in running an experiment here.  Users who wish
> to do this can already do it themselves with "rm" and "ln."  No
> special changes are required.

Erm... no. It is not possible to replace /usr/bin/ksh with ksh93 safely
right now since inetd-based applications will fail at that point due the
|libc::wordexp()| issue. And the OpenSolaris community is able to
collect far more data than we can do - but to help them we have to make
such a test possible first. Using the Linux approach of
compile-the-patched-source-yourself will very likely limit the number of
testers to a mere handfull of people (likely even excluding the
heavyweight stuff like instaling a Oracle database). 
We really need the OpenSolaris community for proper full-scale testing
and therefore we need that "oksh-links"-patch... ;-/

[snip]
> > > Carrying the old
> > > shell around just to support wordexp design flaws seems like the worst
> > > of all possible worlds.
> >
> > I agree. However we need a way to get ksh93 into position first to make
> > the required librares (libast, libshell etc.) available in OS/Net - and
> > that means the current Solaris ksh needs a new place before that point.
> > We could do that all in one step, however I consider that slightly more
> > risky (for example we would have to patch libc AND add new
> > libraries+binaries in one step) - I would prefer the "bankers"-algorithm
> > style which moves from one safe position to another safe position (and
> > let the community test each single step/position) ... :-)
> 
> In that case, it sounds like you're actually advocating my previous
> (and not preferred) "option 1:" integrate ksh93 as /usr/bin/ksh93 now,
> and attempt the /usr/bin/ksh transition later.

Well, yes... for testing it may be helpfull (please please no flamewar
on that) - and we can do the "ksh93-as-/usr/bin/ksh"-ARC case in
_parallel_ to the general bugfixing and integration work (for example
switching tools the "zfs" utillity over to use libshell.so). And we
would have completed a milestone which is visible outside of this
community, too...

> That'd be acceptable (in fact, probably trivial) from an ARC point of
> view, but based on previous messages on this thread, it sounds like
> "/usr/bin/ksh must always be ksh93 or Solaris is doomed, I tell ya,
> doomed" is the prevailing opinion among a vocal set of users.  Any
> such plan likely has to skate between the requirements of those users
> and the Solaris compatibility requirements.

I know... maybe doing the integration + a ksh-version-switch script in
parallel to the ARC case will satisfy the people (for the moment). It
would also solve my concerns that I wanted to start with "ksh93s"
(current version is "ksh93r") for the Solaris integration - which seems
to be some time away from now... ;-(

April: Any ideas/comments/suggestions ?

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to