On Wed, Jun 15, 2022 at 09:05:48AM +0800, Qian Yun wrote: > > On 6/15/22 00:28, Waldek Hebisch wrote: > > > >If you take seriously possibility that viewman may die, then > >natural thing is to automatically restart it. And this is > >what 'sman' is doing now. > > > > First, the source code of viewman is pretty short and simple, > so it is unlikely to die. > > Second, if in the unlikely cases that viewman dies for some > reason, and sman restarts it, it is very likely that > viewman will die again for the same reason. And now it is > in infinite loop. Which is the problem I encountered in the > first place.
_Assuming_ our programs are correct reasonable reason for dying is some _intermittent_ system problem. Like getting wrong bits from RAM/HDD or OOM killer making wrong choice. Even in case of bugs intermittent bugs are quite likely, viewman deals with sockects and related timing issues so lot of things are nondeterministic. Deterministic bugs can be found and fixed much easier than intermittent ones, so after some time spent on debugging remaining bugs are likely to be intermittent... More to the point: you looked at respawning issue because there were missing library. Missing library means broken installation. So real fix is to make sure that library is present. IIUC when building from source link stage will fail in case of missing libraries. So only binary install should matter. If install is done by some tool (script) the tool is supposed to ensure that libraries are present. If user is using simple binary tarball like I provide, this tarball have stated dependencies and user is supposed to install them. Failing to install them may lead to non-working FriCAS. Since user will get error message I do not see this as significant issue: user made mistake, user got error message, user will correct the problem. Coming back to respawning: I do not know if it is really necessary. It may be just defensive programming (sensible because modern OS-es work essentialy in probablistic way, with small but nonzero probablity you may get essentially random failures). It may be attempt at masking errors: incorrect programming may increase chance of error enough to be a trouble, respawning may mask it. I did eliminate some things of similar spirit, but my procedure was to make modification in my local copy of FriCAS, use it for some time (say a year), and commit only if I so no bad effects of the change. And in few cases I had noticed that seemingly useless code in fact was doing useful thing, to I reverted the change. -- Waldek Hebisch -- You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group. To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/20220618012359.GA10710%40fricas.math.uni.wroc.pl.