Thanks for the info guys, I will definitely look into using semaphores, and actually found a decent read on them the night i made this post.
I will also zero out the shared memory file before each initial use. Via /dev/zero. Well only when the IPC server first starts. On Sun, Aug 23, 2015 at 2:51 PM, William Hermans <[email protected]> wrote: > OK so with all that in mind, I'm back to square 1. These processes can not > share the same memory space. libmongoose seems to love stomping all over > the stack, and I'm fairly sure it is not thread safe. Which is why I'm > using two separate executables. > > *OR* maybe I could go crazy and malloc() everything ? heheh no way ;) > > On Sun, Aug 23, 2015 at 2:41 PM, Harvey White <[email protected]> > wrote: > >> On Sun, 23 Aug 2015 14:05:12 -0700, you wrote: >> >> >Walter, >> > >> >Thank you for your reply. >> > >> >I've examined pretty much all of SYSV and POSIX IPC mechanisms. I'm no >> >expert here, as this is really my first go with anything IPC, and pretty >> >much my first "major" application running on Linux. >> >> Which means, perhaps, the first application where the OS is a real >> factor. >> > >> >Pipes may not be fast enough for what I'm trying to accomplish. To keep >> an >> >explanation short. I'm only tracking one PGN. A PGN for fastpackets is a >> >set of data items in this case. For this one PGN I'm dealing with 3 items >> >in data ( voltage, current and frequency ), but program wide I have to >> keep >> >track of much more. This PGN is also only one of of roughly 20. WIth most >> >PGNs issuing data sets of varying length 2 times a second . . . >> >> The problem may be more of "how much data and how long to process it" >> rather than the frequency of the data itself. >> >> You are correct to consider context switching time. >> >> > >> > >> >It may be I'll have to somehow rate limit the data I'll be dealing with. >> I >> >did consider POSIX Message queues, but according to what I've read. POSIX >> >shared memory is the fastest of all IPC mechanisms, and while I do agree >> >that it is not very easy. Personally, I think shared memory is easy now >> >that I understand a lot of it. At minimum, it's not very hard to under >> the >> >idea, and implement it in code. Semaphores, mutexes, and threads however >> I >> >do find a bit intimidating. At minimum, I personally think they're overly >> >complex. >> >> Hmmm, perhaps not quite that intimidating. >> >> A thread is a path of execution. A single program consisting of a >> loop and a single interrupt has two threads. >> >> Threads share common resources, data, address space. It's up to you >> to make them well behaved about what changes what and why.... That's >> why microprocessors save the registers on the stack for an interrupt. >> >> Processes are threads with isolated resources. Each process ideally >> thinks that it is the only thing running in a processor, and data just >> "magically" appears. The OS's job is to keep the processes separate. >> >> Mutexes and semaphores are similar, and are synchronization mechanisms >> between either threads or processes. Please look up the definition >> and explanation of "critical section" in programming. >> >> The idea is to have a flag that can be changed without interference >> from another process, or for that matter, can be read without >> interfering with another process. This could be a complete message. >> >> The mutexes and semaphores serve to synchronize two processes which, >> by the very nature of an operating system, *cannot* be guaranteed to >> by synchronous. >> >> >> > >> >I have though about a lot of different approaches, and I'm not saying my >> >approach won't change. This is just where I am right now. Stumbling about >> >learning the various Linux API's / libraries. Using, and understanding >> >fork() is on my TODO list, I just have not made it there yet. These two >> >processes are actually two separate executables. I am a bit worried about >> >process context switching though. I mean I'm sure I am inuring some >> penalty >> >right now running two separate executables, but I'm not sure it would be >> >the same using threads. >> >> It actually would be the same with thread vs. processes. The only >> real difference is that the threads share the same address space as >> the each other, so they have access to variables without a special >> mechanism (which would take time). >> >> Processes, as I mentioned, run in their own worlds, with the operating >> system controlling what they see (resources, shared memory, etc). That >> mechanism has overhead. >> >> So yes, threads are faster than processes, but more dangerous. >> >> Harvey >> >> > >> >On Sun, Aug 23, 2015 at 1:42 PM, William Hermans <[email protected]> >> wrote: >> > >> >> *1) what stops process A from writing to the shared buffer if process >> B* >> >>> * is reading it?* >> >> >> >> >> >> Nothing. I assume that writes are slower, or at most as fast as reads. >> >> Both reads, and writes are done using a mmap'd pointer. >> >> >> >> *2) what keeps B from getting an incomplete or inaccurate value from* >> >>> * process A for the byte position? is it a byte variable or is it an* >> >>> * integer? Does the processor write this as an integer in one* >> >>> * uninterruptible process?* >> >>> >> >> >> >> Aside from the fact that the byte position I'm testing here is a source >> >> ID, of two different devices. Nothing. They do come in - in order one >> after >> >> the other however. This is not permanent however. When I start tracking >> >> more data, for one set of data this will still work. But not for other >> sets >> >> of data. Write / read type is char. No way really to get this wrong as >> >> with gcc -Wall, gcc will warn. I have no errors or warning when >> compiling. >> >> >> >> 3) if both A and B access Internet devices (over the same interface >> >> I'd guess), what stops the data collision between process A and >> >> process B? What protects that Internet resource? What is the result >> >> if both A and B read a status register at the same time (in the >> >> hardware)? >> >> >> >> No. I guess more correctly they are socket devices. Both using Linux >> >> network sockets. socketcan for CANBus, and standard Linux sockets for >> >> ethernet. The web libraries I did not write. It's libmongoose. >> >> >> >> On Sun, Aug 23, 2015 at 1:06 PM, Harvey White <[email protected]> >> >> wrote: >> >> >> >>> On Sun, 23 Aug 2015 11:44:13 -0700, you wrote: >> >>> >> >>> >Ok. In my case however - >> >>> > >> >>> >Process A writes to shared memory only. >> >>> >Process B Reads from shared memory only. >> >>> >> >>> Ok, so that eliminates one form of data corruption. >> >>> > >> >>> >As it stands Process B starts off with a variable set to 0x00. then >> >>> >compares this to a byte position in the file. When Process B first >> >>> starts, >> >>> >this comparison will always fail. Process B then copies the contents >> of >> >>> the >> >>> >file, sets the variable to this value to the value at the byte >> position. >> >>> >Then sends the data out over a websocket. >> >>> >> >>> Ok: >> >>> 1) what stops process A from writing to the shared buffer if process B >> >>> is reading it? >> >>> >> >>> 2) what keeps B from getting an incomplete or inaccurate value from >> >>> process A for the byte position? is it a byte variable or is it an >> >>> integer? Does the processor write this as an integer in one >> >>> uninterruptible process? >> >>> >> >>> 3) if both A and B access Internet devices (over the same interface >> >>> I'd guess), what stops the data collision between process A and >> >>> process B? What protects that Internet resource? What is the result >> >>> if both A and B read a status register at the same time (in the >> >>> hardware)? >> >>> >> >>> Harvey >> >>> >> >>> >> >>> >> >>> > >> >>> >On the next iteration of the loop cycle. Process B then reads this >> value >> >>> >again, makes the comparison - which will likely succeed. The loop >> cycle >> >>> >then continues until this comparison fails again. Where the logic >> process >> >>> >repeats. It's pretty simple - Or so I thought. >> >>> > >> >>> >The reasoning for this development model is simple. Code segregation. >> >>> Code >> >>> >in process B does not play well with the code in process A. They're >> both >> >>> >accessing network devices, and when it happen simultaneously - Data >> gets >> >>> >lost. Which happens more often than not. >> >>> > >> >>> >On Sun, Aug 23, 2015 at 9:39 AM, Harvey White < >> [email protected]> >> >>> >wrote: >> >>> > >> >>> >> On Sun, 23 Aug 2015 08:52:53 -0700, you wrote: >> >>> >> >> >>> >> >Hi Harvey, >> >>> >> > >> >>> >> >Thanks for the response. I think the biggest question in my mind >> is - >> >>> Ok, >> >>> >> >so perhaps I have a synchronization problem that rears it's head >> once >> >>> in a >> >>> >> >while. But is this really that much of a problem which may cause >> both >> >>> >> >processes to stop ? >> >>> >> > >> >>> >> >A sample here and there once in a while that does not display, >> >>> because it >> >>> >> >is malformed does not bother me. The processes stopping - does. I >> can >> >>> not >> >>> >> >see how this could be causing the processes to stop. However . . >> . I >> >>> >> >honestly do not know one way or the other. >> >>> >> >> >>> >> Process A: while process B is busy, wait, then read from process B >> >>> >> >> >>> >> Process B: while process A is busy, wait, then read from process A >> >>> >> >> >>> >> Classic deadlock. >> >>> >> >> >>> >> Process A: wait for permission to read special area, read, then >> wait >> >>> >> outside that permission area. No restrictions on process B except >> >>> >> when accessing special area (which happens infrequently) . >> >>> >> >> >>> >> Process B: wait for permission to read special area, read, then >> wait >> >>> >> outside that permission area. No restrictions on process A except >> >>> >> when accessing special area (which happens infrequently) . >> >>> >> >> >>> >> Since the waiting is outside that special area, and the processes >> are >> >>> >> not allowed to hog the special area (and block the other process), >> >>> >> then neither process can block the other except for a very brief >> time. >> >>> >> >> >>> >> The implication is that the process check and access special area >> >>> >> takes a very small time, and the wait/do something else part takes >> a >> >>> >> longer time. >> >>> >> >> >>> >> Harvey >> >>> >> >> >>> >> >On Sun, Aug 23, 2015 at 8:43 AM, Harvey White < >> [email protected] >> >>> > >> >>> >> >wrote: >> >>> >> > >> >>> >> >> On Sun, 23 Aug 2015 08:25:02 -0700, you wrote: >> >>> >> >> >> >>> >> >> >HI Przemek, >> >>> >> >> > >> >>> >> >> >*Since this involves two processes that as you say stop >> >>> >> simultaneously,* >> >>> >> >> >> * I'd suspect a latent synchronization bug. You don't say how >> >>> you* >> >>> >> >> >> * interlock your shared memory, but one possibility is that >> your >> >>> >> >> reader* >> >>> >> >> >> * code gets stuck because you overwrite the data while it's >> >>> reading >> >>> >> it.* >> >>> >> >> >> * Debugging this type of thing is tricky, but maybe write a >> >>> state* >> >>> >> >> >> * machine that lights some LEDs that show the phases of your* >> >>> >> >> >> * synchronization process, and wait to see where it's stuck.* >> >>> >> >> > >> >>> >> >> > >> >>> >> >> >Currently, I have no synchronization. At one point I was using >> a >> >>> byte >> >>> >> in >> >>> >> >> >shared memory as a binary stopgap, but after a while it was not >> >>> working >> >>> >> >> >predictably. Now, I'm re-reading documentation on POSIX >> >>> semaphores, and >> >>> >> >> >creating a semaphore in shared memory, instead of using a >> system >> >>> wide >> >>> >> >> >resource. >> >>> >> >> >> >>> >> >> Then you have two things that happen with no predictable time >> >>> >> >> relationship to each other at all. >> >>> >> >> >> >>> >> >> You could be writing part of a multibyte message when trying to >> read >> >>> >> >> that message with another process. >> >>> >> >> >> >>> >> >> A binary semaphore controls access to the shared (message) >> resource. >> >>> >> >> Checking the binary semaphore generally involves turning off >> >>> >> >> interrupts so that the other process can't grab control during >> the >> >>> >> >> check code. If you have two separate processors, you still >> need to >> >>> >> >> deal with the same thing, not so much interrupts, but >> permission to >> >>> >> >> access. The semaphore read/write must be atomic, and the access >> >>> must >> >>> >> >> be negotiated between the two processors (generally happens in >> >>> >> >> hardware for two processors, happens in software for two >> processes >> >>> >> >> running on the same processor). >> >>> >> >> > >> >>> >> >> >*I'd definitely look at this malformation---it could be the >> smoke >> >>> from* >> >>> >> >> >> * the real fire. Or not. In any case, this one should be >> easier >> >>> to* >> >>> >> >> >> * find---just wait for the message, inspect the data in >> firebug, >> >>> and* >> >>> >> >> >> * write a checker routine, inspecting your outgoing data, >> that >> >>> >> watches* >> >>> >> >> >> * for this type of distortion. * >> >>> >> >> > >> >>> >> >> > >> >>> >> >> >The first thing that comes to mind here, which I forgot to add >> to >> >>> my >> >>> >> post >> >>> >> >> >last night is that I am not zeroing out the shared memory file >> >>> before >> >>> >> >> >usage. I know this is bad . . .but am not convinced this is >> what >> >>> the >> >>> >> >> >problem is. However since it is / can be a one line of code >> fix. I >> >>> >> will do >> >>> >> >> >so. The odd thing here is that I get maybe 1-2 notifications an >> >>> hour - >> >>> >> If >> >>> >> >> >that. Then it is inside the actual json object ( string >> pointer - >> >>> e.g. >> >>> >> >> char >> >>> >> >> >*buffer ) - not outside. >> >>> >> >> > >> >>> >> >> >What does all this mean to me. The first impression that I get >> out >> >>> of >> >>> >> this >> >>> >> >> >is that it is a synchronization issue. I'm still not convinced >> >>> though >> >>> >> . . >> >>> >> >> . >> >>> >> >> > >> >>> >> >> >> >>> >> >> analyze the code to see what happens if one process is writing >> while >> >>> >> >> the other is reading. >> >>> >> >> >> >>> >> >> The error rate may be just a measure of how frequently this >> happens. >> >>> >> >> >> >>> >> >> Harvey >> >>> >> >> >> >>> >> >> >> >>> >> >> >Also, for what it's worth. I'm using mmap() and not file >> open(), >> >>> >> read(), >> >>> >> >> >write(). So the code is very fast. >> >>> >> >> > >> >>> >> >> >On Sun, Aug 23, 2015 at 6:40 AM, Przemek Klosowski < >> >>> >> >> >[email protected]> wrote: >> >>> >> >> > >> >>> >> >> >> On Sun, Aug 23, 2015 at 1:31 AM, William Hermans < >> >>> [email protected]> >> >>> >> >> >> wrote: >> >>> >> >> >> > So I have a problem with some code I've been working on >> for the >> >>> >> last >> >>> >> >> few >> >>> >> >> >> > months. The code, which is compiled into two separate >> processes >> >>> >> >> suddenly >> >>> >> >> >> > stops working. No error, nothing in dmesg, nothing in any >> file >> >>> in >> >>> >> >> >> /var/log >> >>> >> >> >> > period. It did however occur to me that since rsyslog is >> >>> likely or >> >>> >> >> >> possible >> >>> >> >> >> > disabled. >> >>> >> >> >> > >> >>> >> >> >> > What my code does is read from the CAN peripheral. Form >> >>> extended >> >>> >> >> packets >> >>> >> >> >> out >> >>> >> >> >> > of the CAN frames( NMEA 2000 fastpackets ), and then >> writes the >> >>> >> data >> >>> >> >> >> into a >> >>> >> >> >> > POSIX shared memory file ( /dev/shm/file ). >> >>> >> >> >> >> >>> >> >> >> Since this involves two processes that as you say stop >> >>> >> simultaneously, >> >>> >> >> >> I'd suspect a latent synchronization bug. You don't say how >> you >> >>> >> >> >> interlock your shared memory, but one possibility is that >> your >> >>> >> reader >> >>> >> >> >> code gets stuck because you overwrite the data while it's >> >>> reading it. >> >>> >> >> >> Debugging this type of thing is tricky, but maybe write a >> state >> >>> >> >> >> machine that lights some LEDs that show the phases of your >> >>> >> >> >> synchronization process, and wait to see where it's stuck. >> >>> >> >> >> >> >>> >> >> >> > The second process simply reads >> >>> >> >> >> > from the file, and shuffles the data out over a websocket >> in >> >>> json / >> >>> >> >> human >> >>> >> >> >> > readable form. The data on the webside of things is tested >> >>> >> accurate, >> >>> >> >> >> > although I do occasionally get a malformed json object >> warning >> >>> from >> >>> >> >> >> firefox >> >>> >> >> >> > firebug. >> >>> >> >> >> >> >>> >> >> >> I'd definitely look at this malformation---it could be the >> smoke >> >>> from >> >>> >> >> >> the real fire. Or not. In any case, this one should be >> easier to >> >>> >> >> >> find---just wait for the message, inspect the data in >> firebug, >> >>> and >> >>> >> >> >> write a checker routine, inspecting your outgoing data, that >> >>> watches >> >>> >> >> >> for this type of distortion. >> >>> >> >> >> >> >>> >> >> >> -- >> >>> >> >> >> For more options, visit http://beagleboard.org/discuss >> >>> >> >> >> --- >> >>> >> >> >> You received this message because you are subscribed to the >> >>> Google >> >>> >> >> Groups >> >>> >> >> >> "BeagleBoard" group. >> >>> >> >> >> To unsubscribe from this group and stop receiving emails >> from it, >> >>> >> send >> >>> >> >> an >> >>> >> >> >> email to [email protected]. >> >>> >> >> >> For more options, visit https://groups.google.com/d/optout. >> >>> >> >> >> >> >>> >> >> >> >>> >> >> -- >> >>> >> >> For more options, visit http://beagleboard.org/discuss >> >>> >> >> --- >> >>> >> >> You received this message because you are subscribed to the >> Google >> >>> >> Groups >> >>> >> >> "BeagleBoard" group. >> >>> >> >> To unsubscribe from this group and stop receiving emails from >> it, >> >>> send >> >>> >> an >> >>> >> >> email to [email protected]. >> >>> >> >> For more options, visit https://groups.google.com/d/optout. >> >>> >> >> >> >>> >> >> >>> >> -- >> >>> >> For more options, visit http://beagleboard.org/discuss >> >>> >> --- >> >>> >> You received this message because you are subscribed to the Google >> >>> Groups >> >>> >> "BeagleBoard" group. >> >>> >> To unsubscribe from this group and stop receiving emails from it, >> send >> >>> an >> >>> >> email to [email protected]. >> >>> >> For more options, visit https://groups.google.com/d/optout. >> >>> >> >> >>> >> >>> -- >> >>> For more options, visit http://beagleboard.org/discuss >> >>> --- >> >>> You received this message because you are subscribed to the Google >> Groups >> >>> "BeagleBoard" group. >> >>> To unsubscribe from this group and stop receiving emails from it, >> send an >> >>> email to [email protected]. >> >>> For more options, visit https://groups.google.com/d/optout. >> >>> >> >> >> >> >> >> -- >> For more options, visit http://beagleboard.org/discuss >> --- >> You received this message because you are subscribed to the Google Groups >> "BeagleBoard" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
