Re: [beagleboard] Could use some bug tracking advice.

William Hermans Sun, 23 Aug 2015 15:38:17 -0700

Thanks for the info guys, I will definitely look into using semaphores, and
actually found a decent read on them the night i made this post.


I will also zero out the shared memory file before each initial use. Via
/dev/zero. Well only when the IPC server first starts.

On Sun, Aug 23, 2015 at 2:51 PM, William Hermans <[email protected]> wrote:

> OK so with all that in mind, I'm back to square 1. These processes can not
> share the same memory space. libmongoose seems to love stomping all over
> the stack, and I'm fairly sure it is not thread safe. Which is why I'm
> using two separate executables.
>
> *OR* maybe I could go crazy and malloc() everything ? heheh no way ;)
>
> On Sun, Aug 23, 2015 at 2:41 PM, Harvey White <[email protected]>
> wrote:
>
>> On Sun, 23 Aug 2015 14:05:12 -0700, you wrote:
>>
>> >Walter,
>> >
>> >Thank you for your reply.
>> >
>> >I've examined pretty much all of SYSV and POSIX IPC mechanisms. I'm no
>> >expert here, as this is really my first go with anything IPC, and pretty
>> >much my first "major" application running on Linux.
>>
>> Which means, perhaps, the first application where the OS is a real
>> factor.
>> >
>> >Pipes may not be fast enough for what I'm trying to accomplish. To keep
>> an
>> >explanation short. I'm only tracking one PGN. A PGN for fastpackets is a
>> >set of data items in this case. For this one PGN I'm dealing with 3 items
>> >in data ( voltage, current and frequency ), but program wide I have to
>> keep
>> >track of much more. This PGN is also only one of of roughly 20. WIth most
>> >PGNs issuing data sets of varying length 2 times a second . . .
>>
>> The problem may be more of "how much data and how long to process it"
>> rather than the frequency of the data itself.
>>
>> You are correct to consider context switching time.
>>
>> >
>> >
>> >It may be I'll have to somehow rate limit the data I'll be dealing with.
>> I
>> >did consider POSIX Message queues, but according to what I've read. POSIX
>> >shared memory is the fastest of all IPC mechanisms, and while I do agree
>> >that it is not very easy. Personally, I think shared memory is easy now
>> >that I understand a lot of it. At minimum, it's not very hard to under
>> the
>> >idea, and implement it in code. Semaphores, mutexes, and threads however
>> I
>> >do find a bit intimidating. At minimum, I personally think they're overly
>> >complex.
>>
>> Hmmm, perhaps not quite that intimidating.
>>
>> A thread is a path of execution.  A single program consisting of a
>> loop and a single interrupt has two threads.
>>
>> Threads share common resources, data, address space.  It's up to you
>> to make them well behaved about what changes what and why.... That's
>> why microprocessors save the registers on the stack for an interrupt.
>>
>> Processes are threads with isolated resources.  Each process ideally
>> thinks that it is the only thing running in a processor, and data just
>> "magically" appears.  The OS's job is to keep the processes separate.
>>
>> Mutexes and semaphores are similar, and are synchronization mechanisms
>> between either threads or processes.  Please look up the definition
>> and explanation of "critical section" in programming.
>>
>> The idea is to have a flag that can be changed without interference
>> from another process, or for that matter, can be read without
>> interfering with another process.  This could be a complete message.
>>
>> The mutexes and semaphores serve to synchronize two processes which,
>> by the very nature of an operating system, *cannot* be guaranteed to
>> by synchronous.
>>
>>
>> >
>> >I have though about a lot of different approaches, and I'm not saying my
>> >approach won't change. This is just where I am right now. Stumbling about
>> >learning the various Linux API's / libraries. Using, and understanding
>> >fork() is on my TODO list, I just have not made it there yet. These two
>> >processes are actually two separate executables. I am a bit worried about
>> >process context switching though. I mean I'm sure I am inuring some
>> penalty
>> >right now running two separate executables, but I'm not sure it would be
>> >the same using threads.
>>
>> It actually would be the same with thread vs. processes.  The only
>> real difference is that the threads share the same address space as
>> the each other, so they have access to variables without a special
>> mechanism (which would take time).
>>
>> Processes, as I mentioned, run in their own worlds, with the operating
>> system controlling what they see (resources, shared memory, etc). That
>> mechanism has overhead.
>>
>> So yes, threads are faster than processes, but more dangerous.
>>
>> Harvey
>>
>> >
>> >On Sun, Aug 23, 2015 at 1:42 PM, William Hermans <[email protected]>
>> wrote:
>> >
>> >> *1) what stops process A from writing to the shared buffer if process
>> B*
>> >>> * is reading it?*
>> >>
>> >>
>> >> Nothing. I assume that writes are slower, or at most as fast as reads.
>> >> Both reads, and writes are done using a mmap'd pointer.
>> >>
>> >> *2) what keeps B from getting an incomplete or inaccurate value from*
>> >>> * process A for the byte position?  is it a byte variable or is it an*
>> >>> * integer?  Does the processor write this as an integer in one*
>> >>> * uninterruptible process?*
>> >>>
>> >>
>> >> Aside from the fact that the byte position I'm testing here is a source
>> >> ID, of two different devices. Nothing. They do come in - in order one
>> after
>> >> the other however. This is not permanent however. When I start tracking
>> >> more data, for one set of data this will still work. But not for other
>> sets
>> >> of data. Write / read type is  char. No way really to get this wrong as
>> >> with gcc -Wall, gcc will warn. I have no errors or warning when
>> compiling.
>> >>
>> >> 3) if both A and B access Internet devices (over the same interface
>> >> I'd guess), what stops the data collision between process A and
>> >> process B?  What protects that Internet resource?  What is the result
>> >> if both A and B read a status register at the same time (in the
>> >> hardware)?
>> >>
>> >> No. I guess more correctly they are socket devices. Both using Linux
>> >> network sockets. socketcan for CANBus, and standard Linux sockets for
>> >> ethernet. The web libraries I did not write. It's libmongoose.
>> >>
>> >> On Sun, Aug 23, 2015 at 1:06 PM, Harvey White <[email protected]>
>> >> wrote:
>> >>
>> >>> On Sun, 23 Aug 2015 11:44:13 -0700, you wrote:
>> >>>
>> >>> >Ok. In my case however -
>> >>> >
>> >>> >Process A writes to shared memory only.
>> >>> >Process B Reads from shared memory only.
>> >>>
>> >>> Ok, so that eliminates one form of data corruption.
>> >>> >
>> >>> >As it stands Process B starts off with a variable set to 0x00. then
>> >>> >compares this to a byte position in the file. When Process B first
>> >>> starts,
>> >>> >this comparison will always fail. Process B then copies the contents
>> of
>> >>> the
>> >>> >file, sets the variable to this value to the value at the byte
>> position.
>> >>> >Then sends the data out over a websocket.
>> >>>
>> >>> Ok:
>> >>> 1) what stops process A from writing to the shared buffer if process B
>> >>> is reading it?
>> >>>
>> >>> 2) what keeps B from getting an incomplete or inaccurate value from
>> >>> process A for the byte position?  is it a byte variable or is it an
>> >>> integer?  Does the processor write this as an integer in one
>> >>> uninterruptible process?
>> >>>
>> >>> 3) if both A and B access Internet devices (over the same interface
>> >>> I'd guess), what stops the data collision between process A and
>> >>> process B?  What protects that Internet resource?  What is the result
>> >>> if both A and B read a status register at the same time (in the
>> >>> hardware)?
>> >>>
>> >>> Harvey
>> >>>
>> >>>
>> >>>
>> >>> >
>> >>> >On the next iteration of the loop cycle. Process B then reads this
>> value
>> >>> >again, makes the comparison - which will likely succeed. The loop
>> cycle
>> >>> >then continues until this comparison fails again. Where the logic
>> process
>> >>> >repeats. It's pretty simple - Or so I thought.
>> >>> >
>> >>> >The reasoning for this development model is simple. Code segregation.
>> >>> Code
>> >>> >in process B does not play well with the code in process A. They're
>> both
>> >>> >accessing network devices, and when it happen simultaneously - Data
>> gets
>> >>> >lost. Which happens more often than not.
>> >>> >
>> >>> >On Sun, Aug 23, 2015 at 9:39 AM, Harvey White <
>> [email protected]>
>> >>> >wrote:
>> >>> >
>> >>> >> On Sun, 23 Aug 2015 08:52:53 -0700, you wrote:
>> >>> >>
>> >>> >> >Hi Harvey,
>> >>> >> >
>> >>> >> >Thanks for the response. I think the biggest question in my mind
>> is -
>> >>> Ok,
>> >>> >> >so perhaps I have a synchronization problem that rears it's head
>> once
>> >>> in a
>> >>> >> >while. But is this really that much of a problem which may cause
>> both
>> >>> >> >processes to stop ?
>> >>> >> >
>> >>> >> >A sample here and there once in a while that does not display,
>> >>> because it
>> >>> >> >is malformed does not bother me. The processes stopping - does. I
>> can
>> >>> not
>> >>> >> >see how this could be causing the processes to stop. However . .
>> . I
>> >>> >> >honestly do not know one way or the other.
>> >>> >>
>> >>> >> Process A: while process B is busy, wait, then read from process B
>> >>> >>
>> >>> >> Process B: while process A is busy, wait, then read from process A
>> >>> >>
>> >>> >> Classic deadlock.
>> >>> >>
>> >>> >> Process A: wait for permission to read special area, read, then
>> wait
>> >>> >> outside that permission area.  No restrictions on process B except
>> >>> >> when accessing special area (which happens infrequently) .
>> >>> >>
>> >>> >> Process B: wait for permission to read special area, read, then
>> wait
>> >>> >> outside that permission area.  No restrictions on process A except
>> >>> >> when accessing special area (which happens infrequently) .
>> >>> >>
>> >>> >> Since the waiting is outside that special area, and the processes
>> are
>> >>> >> not allowed to hog the special area (and block the other process),
>> >>> >> then neither process can block the other except for a very brief
>> time.
>> >>> >>
>> >>> >> The implication is that the process check and access special area
>> >>> >> takes a very small time, and the wait/do something else part takes
>> a
>> >>> >> longer time.
>> >>> >>
>> >>> >> Harvey
>> >>> >>
>> >>> >> >On Sun, Aug 23, 2015 at 8:43 AM, Harvey White <
>> [email protected]
>> >>> >
>> >>> >> >wrote:
>> >>> >> >
>> >>> >> >> On Sun, 23 Aug 2015 08:25:02 -0700, you wrote:
>> >>> >> >>
>> >>> >> >> >HI Przemek,
>> >>> >> >> >
>> >>> >> >> >*Since this involves two processes that as you say stop
>> >>> >> simultaneously,*
>> >>> >> >> >> * I'd suspect a latent synchronization bug. You don't say how
>> >>> you*
>> >>> >> >> >> * interlock your shared memory,  but one possibility is that
>> your
>> >>> >> >> reader*
>> >>> >> >> >> * code gets stuck because you overwrite the data while it's
>> >>> reading
>> >>> >> it.*
>> >>> >> >> >> * Debugging this type of thing is tricky, but maybe write a
>> >>> state*
>> >>> >> >> >> * machine that lights some LEDs that show the phases of your*
>> >>> >> >> >> * synchronization process, and wait to see where it's stuck.*
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >Currently, I have no synchronization. At one point I was using
>> a
>> >>> byte
>> >>> >> in
>> >>> >> >> >shared memory as a binary stopgap, but after a while it was not
>> >>> working
>> >>> >> >> >predictably. Now, I'm re-reading documentation on POSIX
>> >>> semaphores, and
>> >>> >> >> >creating a semaphore in shared memory, instead of using a
>> system
>> >>> wide
>> >>> >> >> >resource.
>> >>> >> >>
>> >>> >> >> Then you have two things that happen with no predictable time
>> >>> >> >> relationship to each other at all.
>> >>> >> >>
>> >>> >> >> You could be writing part of a multibyte message when trying to
>> read
>> >>> >> >> that message with another process.
>> >>> >> >>
>> >>> >> >> A binary semaphore controls access to the shared (message)
>> resource.
>> >>> >> >> Checking the binary semaphore generally involves turning off
>> >>> >> >> interrupts so that the other process can't grab control during
>> the
>> >>> >> >> check code.  If you have two separate processors, you still
>> need to
>> >>> >> >> deal with the same thing, not so much interrupts, but
>> permission to
>> >>> >> >> access.  The semaphore read/write must be atomic, and the access
>> >>> must
>> >>> >> >> be negotiated between the two processors (generally happens in
>> >>> >> >> hardware for two processors, happens in software for two
>> processes
>> >>> >> >> running on the same processor).
>> >>> >> >> >
>> >>> >> >> >*I'd definitely look at this malformation---it could be the
>> smoke
>> >>> from*
>> >>> >> >> >> * the real fire. Or not. In any case, this one should be
>> easier
>> >>> to*
>> >>> >> >> >> * find---just wait for the message, inspect the data in
>> firebug,
>> >>> and*
>> >>> >> >> >> * write a checker routine, inspecting your outgoing data,
>> that
>> >>> >> watches*
>> >>> >> >> >> * for this type of distortion. *
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >The first thing that comes to mind here, which I forgot to add
>> to
>> >>> my
>> >>> >> post
>> >>> >> >> >last night is that I am not zeroing out the shared memory file
>> >>> before
>> >>> >> >> >usage. I know this is bad . . .but am not convinced this is
>> what
>> >>> the
>> >>> >> >> >problem is. However since it is / can be a one line of code
>> fix. I
>> >>> >> will do
>> >>> >> >> >so. The odd thing here is that I get maybe 1-2 notifications an
>> >>> hour -
>> >>> >> If
>> >>> >> >> >that. Then it is inside the actual json object ( string
>> pointer -
>> >>> e.g.
>> >>> >> >> char
>> >>> >> >> >*buffer ) - not outside.
>> >>> >> >> >
>> >>> >> >> >What does all this mean to me. The first impression that I get
>> out
>> >>> of
>> >>> >> this
>> >>> >> >> >is that it is a synchronization issue. I'm still not convinced
>> >>> though
>> >>> >> . .
>> >>> >> >> .
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >> analyze the code to see what happens if one process is writing
>> while
>> >>> >> >> the other is reading.
>> >>> >> >>
>> >>> >> >> The error rate may be just a measure of how frequently this
>> happens.
>> >>> >> >>
>> >>> >> >> Harvey
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> >Also, for what it's worth. I'm using mmap() and not file
>> open(),
>> >>> >> read(),
>> >>> >> >> >write(). So the code is very fast.
>> >>> >> >> >
>> >>> >> >> >On Sun, Aug 23, 2015 at 6:40 AM, Przemek Klosowski <
>> >>> >> >> >[email protected]> wrote:
>> >>> >> >> >
>> >>> >> >> >> On Sun, Aug 23, 2015 at 1:31 AM, William Hermans <
>> >>> [email protected]>
>> >>> >> >> >> wrote:
>> >>> >> >> >> > So I have a problem with some code I've been working on
>> for the
>> >>> >> last
>> >>> >> >> few
>> >>> >> >> >> > months. The code, which is compiled into two separate
>> processes
>> >>> >> >> suddenly
>> >>> >> >> >> > stops working. No error, nothing in dmesg, nothing in any
>> file
>> >>> in
>> >>> >> >> >> /var/log
>> >>> >> >> >> > period. It did however occur to me that since rsyslog is
>> >>> likely or
>> >>> >> >> >> possible
>> >>> >> >> >> > disabled.
>> >>> >> >> >> >
>> >>> >> >> >> > What my code does is read from the CAN peripheral. Form
>> >>> extended
>> >>> >> >> packets
>> >>> >> >> >> out
>> >>> >> >> >> > of the CAN frames( NMEA 2000 fastpackets ), and then
>> writes the
>> >>> >> data
>> >>> >> >> >> into a
>> >>> >> >> >> > POSIX shared memory file ( /dev/shm/file ).
>> >>> >> >> >>
>> >>> >> >> >> Since this involves two processes that as you say stop
>> >>> >> simultaneously,
>> >>> >> >> >> I'd suspect a latent synchronization bug. You don't say how
>> you
>> >>> >> >> >> interlock your shared memory,  but one possibility is that
>> your
>> >>> >> reader
>> >>> >> >> >> code gets stuck because you overwrite the data while it's
>> >>> reading it.
>> >>> >> >> >> Debugging this type of thing is tricky, but maybe write a
>> state
>> >>> >> >> >> machine that lights some LEDs that show the phases of your
>> >>> >> >> >> synchronization process, and wait to see where it's stuck.
>> >>> >> >> >>
>> >>> >> >> >> > The second process simply reads
>> >>> >> >> >> > from the file, and shuffles the data out over a websocket
>> in
>> >>> json /
>> >>> >> >> human
>> >>> >> >> >> > readable form. The data on the webside of things is tested
>> >>> >> accurate,
>> >>> >> >> >> > although I do occasionally get a malformed json object
>> warning
>> >>> from
>> >>> >> >> >> firefox
>> >>> >> >> >> > firebug.
>> >>> >> >> >>
>> >>> >> >> >> I'd definitely look at this malformation---it could be the
>> smoke
>> >>> from
>> >>> >> >> >> the real fire. Or not. In any case, this one should be
>> easier to
>> >>> >> >> >> find---just wait for the message, inspect the data in
>> firebug,
>> >>> and
>> >>> >> >> >> write a checker routine, inspecting your outgoing data, that
>> >>> watches
>> >>> >> >> >> for this type of distortion.
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> For more options, visit http://beagleboard.org/discuss
>> >>> >> >> >> ---
>> >>> >> >> >> You received this message because you are subscribed to the
>> >>> Google
>> >>> >> >> Groups
>> >>> >> >> >> "BeagleBoard" group.
>> >>> >> >> >> To unsubscribe from this group and stop receiving emails
>> from it,
>> >>> >> send
>> >>> >> >> an
>> >>> >> >> >> email to [email protected].
>> >>> >> >> >> For more options, visit https://groups.google.com/d/optout.
>> >>> >> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> For more options, visit http://beagleboard.org/discuss
>> >>> >> >> ---
>> >>> >> >> You received this message because you are subscribed to the
>> Google
>> >>> >> Groups
>> >>> >> >> "BeagleBoard" group.
>> >>> >> >> To unsubscribe from this group and stop receiving emails from
>> it,
>> >>> send
>> >>> >> an
>> >>> >> >> email to [email protected].
>> >>> >> >> For more options, visit https://groups.google.com/d/optout.
>> >>> >> >>
>> >>> >>
>> >>> >> --
>> >>> >> For more options, visit http://beagleboard.org/discuss
>> >>> >> ---
>> >>> >> You received this message because you are subscribed to the Google
>> >>> Groups
>> >>> >> "BeagleBoard" group.
>> >>> >> To unsubscribe from this group and stop receiving emails from it,
>> send
>> >>> an
>> >>> >> email to [email protected].
>> >>> >> For more options, visit https://groups.google.com/d/optout.
>> >>> >>
>> >>>
>> >>> --
>> >>> For more options, visit http://beagleboard.org/discuss
>> >>> ---
>> >>> You received this message because you are subscribed to the Google
>> Groups
>> >>> "BeagleBoard" group.
>> >>> To unsubscribe from this group and stop receiving emails from it,
>> send an
>> >>> email to [email protected].
>> >>> For more options, visit https://groups.google.com/d/optout.
>> >>>
>> >>
>> >>
>>
>> --
>> For more options, visit http://beagleboard.org/discuss
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "BeagleBoard" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [beagleboard] Could use some bug tracking advice.

Reply via email to