On Sun, 23 Aug 2015 11:44:13 -0700, you wrote: >Ok. In my case however - > >Process A writes to shared memory only. >Process B Reads from shared memory only.
Ok, so that eliminates one form of data corruption. > >As it stands Process B starts off with a variable set to 0x00. then >compares this to a byte position in the file. When Process B first starts, >this comparison will always fail. Process B then copies the contents of the >file, sets the variable to this value to the value at the byte position. >Then sends the data out over a websocket. Ok: 1) what stops process A from writing to the shared buffer if process B is reading it? 2) what keeps B from getting an incomplete or inaccurate value from process A for the byte position? is it a byte variable or is it an integer? Does the processor write this as an integer in one uninterruptible process? 3) if both A and B access Internet devices (over the same interface I'd guess), what stops the data collision between process A and process B? What protects that Internet resource? What is the result if both A and B read a status register at the same time (in the hardware)? Harvey > >On the next iteration of the loop cycle. Process B then reads this value >again, makes the comparison - which will likely succeed. The loop cycle >then continues until this comparison fails again. Where the logic process >repeats. It's pretty simple - Or so I thought. > >The reasoning for this development model is simple. Code segregation. Code >in process B does not play well with the code in process A. They're both >accessing network devices, and when it happen simultaneously - Data gets >lost. Which happens more often than not. > >On Sun, Aug 23, 2015 at 9:39 AM, Harvey White <[email protected]> >wrote: > >> On Sun, 23 Aug 2015 08:52:53 -0700, you wrote: >> >> >Hi Harvey, >> > >> >Thanks for the response. I think the biggest question in my mind is - Ok, >> >so perhaps I have a synchronization problem that rears it's head once in a >> >while. But is this really that much of a problem which may cause both >> >processes to stop ? >> > >> >A sample here and there once in a while that does not display, because it >> >is malformed does not bother me. The processes stopping - does. I can not >> >see how this could be causing the processes to stop. However . . . I >> >honestly do not know one way or the other. >> >> Process A: while process B is busy, wait, then read from process B >> >> Process B: while process A is busy, wait, then read from process A >> >> Classic deadlock. >> >> Process A: wait for permission to read special area, read, then wait >> outside that permission area. No restrictions on process B except >> when accessing special area (which happens infrequently) . >> >> Process B: wait for permission to read special area, read, then wait >> outside that permission area. No restrictions on process A except >> when accessing special area (which happens infrequently) . >> >> Since the waiting is outside that special area, and the processes are >> not allowed to hog the special area (and block the other process), >> then neither process can block the other except for a very brief time. >> >> The implication is that the process check and access special area >> takes a very small time, and the wait/do something else part takes a >> longer time. >> >> Harvey >> >> >On Sun, Aug 23, 2015 at 8:43 AM, Harvey White <[email protected]> >> >wrote: >> > >> >> On Sun, 23 Aug 2015 08:25:02 -0700, you wrote: >> >> >> >> >HI Przemek, >> >> > >> >> >*Since this involves two processes that as you say stop >> simultaneously,* >> >> >> * I'd suspect a latent synchronization bug. You don't say how you* >> >> >> * interlock your shared memory, but one possibility is that your >> >> reader* >> >> >> * code gets stuck because you overwrite the data while it's reading >> it.* >> >> >> * Debugging this type of thing is tricky, but maybe write a state* >> >> >> * machine that lights some LEDs that show the phases of your* >> >> >> * synchronization process, and wait to see where it's stuck.* >> >> > >> >> > >> >> >Currently, I have no synchronization. At one point I was using a byte >> in >> >> >shared memory as a binary stopgap, but after a while it was not working >> >> >predictably. Now, I'm re-reading documentation on POSIX semaphores, and >> >> >creating a semaphore in shared memory, instead of using a system wide >> >> >resource. >> >> >> >> Then you have two things that happen with no predictable time >> >> relationship to each other at all. >> >> >> >> You could be writing part of a multibyte message when trying to read >> >> that message with another process. >> >> >> >> A binary semaphore controls access to the shared (message) resource. >> >> Checking the binary semaphore generally involves turning off >> >> interrupts so that the other process can't grab control during the >> >> check code. If you have two separate processors, you still need to >> >> deal with the same thing, not so much interrupts, but permission to >> >> access. The semaphore read/write must be atomic, and the access must >> >> be negotiated between the two processors (generally happens in >> >> hardware for two processors, happens in software for two processes >> >> running on the same processor). >> >> > >> >> >*I'd definitely look at this malformation---it could be the smoke from* >> >> >> * the real fire. Or not. In any case, this one should be easier to* >> >> >> * find---just wait for the message, inspect the data in firebug, and* >> >> >> * write a checker routine, inspecting your outgoing data, that >> watches* >> >> >> * for this type of distortion. * >> >> > >> >> > >> >> >The first thing that comes to mind here, which I forgot to add to my >> post >> >> >last night is that I am not zeroing out the shared memory file before >> >> >usage. I know this is bad . . .but am not convinced this is what the >> >> >problem is. However since it is / can be a one line of code fix. I >> will do >> >> >so. The odd thing here is that I get maybe 1-2 notifications an hour - >> If >> >> >that. Then it is inside the actual json object ( string pointer - e.g. >> >> char >> >> >*buffer ) - not outside. >> >> > >> >> >What does all this mean to me. The first impression that I get out of >> this >> >> >is that it is a synchronization issue. I'm still not convinced though >> . . >> >> . >> >> > >> >> >> >> analyze the code to see what happens if one process is writing while >> >> the other is reading. >> >> >> >> The error rate may be just a measure of how frequently this happens. >> >> >> >> Harvey >> >> >> >> >> >> >Also, for what it's worth. I'm using mmap() and not file open(), >> read(), >> >> >write(). So the code is very fast. >> >> > >> >> >On Sun, Aug 23, 2015 at 6:40 AM, Przemek Klosowski < >> >> >[email protected]> wrote: >> >> > >> >> >> On Sun, Aug 23, 2015 at 1:31 AM, William Hermans <[email protected]> >> >> >> wrote: >> >> >> > So I have a problem with some code I've been working on for the >> last >> >> few >> >> >> > months. The code, which is compiled into two separate processes >> >> suddenly >> >> >> > stops working. No error, nothing in dmesg, nothing in any file in >> >> >> /var/log >> >> >> > period. It did however occur to me that since rsyslog is likely or >> >> >> possible >> >> >> > disabled. >> >> >> > >> >> >> > What my code does is read from the CAN peripheral. Form extended >> >> packets >> >> >> out >> >> >> > of the CAN frames( NMEA 2000 fastpackets ), and then writes the >> data >> >> >> into a >> >> >> > POSIX shared memory file ( /dev/shm/file ). >> >> >> >> >> >> Since this involves two processes that as you say stop >> simultaneously, >> >> >> I'd suspect a latent synchronization bug. You don't say how you >> >> >> interlock your shared memory, but one possibility is that your >> reader >> >> >> code gets stuck because you overwrite the data while it's reading it. >> >> >> Debugging this type of thing is tricky, but maybe write a state >> >> >> machine that lights some LEDs that show the phases of your >> >> >> synchronization process, and wait to see where it's stuck. >> >> >> >> >> >> > The second process simply reads >> >> >> > from the file, and shuffles the data out over a websocket in json / >> >> human >> >> >> > readable form. The data on the webside of things is tested >> accurate, >> >> >> > although I do occasionally get a malformed json object warning from >> >> >> firefox >> >> >> > firebug. >> >> >> >> >> >> I'd definitely look at this malformation---it could be the smoke from >> >> >> the real fire. Or not. In any case, this one should be easier to >> >> >> find---just wait for the message, inspect the data in firebug, and >> >> >> write a checker routine, inspecting your outgoing data, that watches >> >> >> for this type of distortion. >> >> >> >> >> >> -- >> >> >> For more options, visit http://beagleboard.org/discuss >> >> >> --- >> >> >> You received this message because you are subscribed to the Google >> >> Groups >> >> >> "BeagleBoard" group. >> >> >> To unsubscribe from this group and stop receiving emails from it, >> send >> >> an >> >> >> email to [email protected]. >> >> >> For more options, visit https://groups.google.com/d/optout. >> >> >> >> >> >> >> -- >> >> For more options, visit http://beagleboard.org/discuss >> >> --- >> >> You received this message because you are subscribed to the Google >> Groups >> >> "BeagleBoard" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an >> >> email to [email protected]. >> >> For more options, visit https://groups.google.com/d/optout. >> >> >> >> -- >> For more options, visit http://beagleboard.org/discuss >> --- >> You received this message because you are subscribed to the Google Groups >> "BeagleBoard" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
