On Wed, Jun 15, 2022 at 09:05:48AM +0800, Qian Yun wrote:
> 
> On 6/15/22 00:28, Waldek Hebisch wrote:
> >
> >If you take seriously possibility that viewman may die, then
> >natural thing is to automatically restart it.  And this is
> >what 'sman' is doing now.
> >
> 
> First, the source code of viewman is pretty short and simple,
> so it is unlikely to die.
> 
> Second, if in the unlikely cases that viewman dies for some
> reason, and sman restarts it, it is very likely that
> viewman will die again for the same reason.  And now it is
> in infinite loop.  Which is the problem I encountered in the
> first place.

_Assuming_ our programs are correct reasonable reason for dying
is some _intermittent_ system problem.  Like getting wrong bits
from RAM/HDD or OOM killer making wrong choice.  Even in case
of bugs intermittent bugs are quite likely, viewman deals
with sockects and related timing issues so lot of things
are nondeterministic.  Deterministic bugs can be found and fixed
much easier than intermittent ones, so after some time spent on
debugging remaining bugs are likely to be intermittent...

More to the point: you looked at respawning issue because
there were missing library.  Missing library means broken
installation.  So real fix is to make sure that library is
present.  IIUC when building from source link stage will
fail in case of missing libraries.  So only binary install
should matter.  If install is done by some tool (script) the
tool is supposed to ensure that libraries are present.
If user is using simple binary tarball like I provide,
this tarball have stated dependencies and user is supposed
to install them.  Failing to install them may lead to
non-working FriCAS.  Since user will get error message
I do not see this as significant issue: user made mistake,
user got error message, user will correct the problem.

Coming back to respawning: I do not know if it is really
necessary.  It may be just defensive programming
(sensible because modern OS-es work essentialy in
probablistic way, with small but nonzero probablity
you may get essentially random failures).  It may be
attempt at masking errors: incorrect programming may
increase chance of error enough to be a trouble,
respawning may mask it.

I did eliminate some things of similar spirit, but my
procedure was to make modification in my local copy of
FriCAS, use it for some time (say a year), and commit
only if I so no bad effects of the change.  And in
few cases I had noticed that seemingly useless code
in fact was doing useful thing, to I reverted the change.

-- 
                              Waldek Hebisch

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to fricas-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/fricas-devel/20220618012359.GA10710%40fricas.math.uni.wroc.pl.

Reply via email to