On Fri, Apr 19, 2024 at 09:02:59PM +0800, Qian Yun wrote:
> 
> 
> On 4/19/24 20:41, Waldek Hebisch wrote:
> > 
> > Long ago I looked at communication protocol between various
> > processes that we use and my conclusion was that it is
> > inherently racy: there are parallel chanels of communication
> > and both ends assume that data comes in right order.  I added
> > a little piece of code to detect lost race, that mitigates
> > worst things.  Machines now are faster than in the past,
> > so lost races probably are quite rare given 1s delay.
> > It is quite possible that original authors after realizing
> > that there are races put delays in places that are not necessary
> > (and wrogly placed delay could even make races worse).
> > 
> > To explain more: most of C code uses sockets and this alone should
> > be OK.  However, data from FRICASsys sends textual output on
> > standard output, which is captured by 'sman'.  But "event indicators"
> > are sent via sockets, so we depend on data on FRICASsys standard
> > output and data coming via sockets to arrive in order in which
> > it was sent.
> 
> In this particular case, it is not socket, but pipe.
> 
> > >   The only reason to put a "sleep" here,
> > > I presume it is a workaround for a bug in viewAlone:
> > > 
> > > See the "printf" I removed bellow: it writes to stdout instead of
> > > stderr, causing the parent process function "readViewport" to return
> > > early and make parent process exits, and the data passed to child
> > > process via pipe is lost.
> > 
> > IIRC 'viewAlone' is started from HyperDoc when you click on a image.
> > Normal graphics uses 'view2D' and 'view3D' via 'viewman'.
> > 
> > Anyway, I think it would be reasonably safe to use smaller delay,
> > say 50 or 100 milliseconds (I use such delay during startup).
> > Quite possible that we can elliminate the delays, but that IMO
> > would require deeper analysis and more testing.  As I wrote we
> > have several channels of communication and code assumes certain
> > ordering contraints.  Without identifying contraints (and some
> > could be far from obvious) and analysing them it is hard to say
> > more than "there are races".
> 
> This case is rather straight forward I think, viewman/viewAlone
> forks and passes data to child process view2D/view3D via pipe,
> then child process writes a value after receiving all data,
> then viewAlone exits while viewman sends it via sock to fricas
> and continues to run.
> 
> I can accept the 100ms sleep as well.  But could you give this
> another review based on the new information I provided?

_Locally_ this should work fine without delay.  But the question
is how this affects timing in the whole constelation of processes
that we use.

I do not say that the change is bad.  Simply, on fast unloaded
machine in fixed configuration timings typically are resonably
repeateble and it may be very hard to reproduce a problem
(if there are any).  OTOH in different configuration timing
problem may appear.  From my point of view testing is not
enough to prove absence to troubles, and since there are
races simple theoretical argument also can not work.  Pragmatically
we can accept some risk of breakge, but I prefer to be
conservative here, accepting changes that risk breakage can
have cumulative effect and significantly degrade code quality.
And to say the truth I much more willig to accept some
deterministic risk, as this once discoverd can be reproduced
and fixed.  Non deterministic breakge can live long...

-- 
                              Waldek Hebisch

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to fricas-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/fricas-devel/ZiKAOi--bW1sHGl_%40fricas.org.

Reply via email to