Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Graeme Foot Tue, 06 Mar 2018 13:53:25 -0800

Hi Gavin,

I have attached updated eoe patches to maintain compatibility with your 
patchset for non-RTAI / RTDM applications.


Patch 0001:
This will now continue with the existing behaviour of auto-create and 
auto-delete of eoe ports as the eoe slaves are detected and removed.  It does 
contain the fix for removing the lock from the transmit callback so eoe users 
will probably want this update even if they want to keep existing behaviour.

Patch 0002:
Still only targeted at RTAI / RTDM users, but will now maintain existing 
behaviour (internal locking via the internal callbacks) for Linux user space 
applications.


Regards,
Graeme.

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-boun...@etherlab.org] On Behalf Of 
Graeme Foot
Sent: Tuesday, 6 March 2018 1:40 PM
To: Esben Haabendal <esben.haaben...@gmail.com>
Cc: etherlab-dev@etherlab.org
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi,

The email's getting a little hard to read so I'll try to summarise what I'm 
doing here first and reply to specifics below.

The primary problem with RTAI is that you cannot use Linux locks in RTAI hard 
realtime calls.  So the base 0017 and 0018 patches are useless for RTAI.  
Secondly RTAI / RTDM cannot use callbacks, so EoE does not work for this 
framework.

I put the patches in the features branch of Gavin's patchset as they should be 
optional and especially patch 2 (0002-eoe-via-rtdm.patch) is specifically for 
RTAI / RTDM for EoE.  I haven't looked too hard at making this one generic as 
it does break the previous patches I mentioned (base 0017 and 0018).  But it 
also needs to break them as the whole point of it is to use an application 
level lock (i.e. RTAI lock) instead (as per the design of the vanilla Etherlab 
master).  So no, the patch does not support "plain" user-space applications 
(i.e. not using RTDM) as my focus was only for RTDM applications.  Patch 2 
could probably have a few more defines to make sure it only affects RTDM users 
and keeps 0017/0018 active for non-RTDM installs.

patch 1 (0001-eoe-addif-delif-tools.patch) only looks at changing the lifetime 
options of the eoe interfaces so anyone should be able to use this.

patch 2 (0002-eoe-via-rtdm.patch) is targeted to add the ability for RTDM users 
to provide a user space alternate to the callbacks framework.  If people prefer 
this method to allow "plain" Linux user space applications to supply their own 
locking then it could be extended.  But it does mean that the base patches 
"0017-Master-locks-to-avoid-corrupted-datagram-queue.patch" and 
"0018-Use-call-back-functions.patch" do need to be removed.

So using patch 2 is a matter of preference: Patch 2 uses the Vanilla EtherLab 
EtherCAT master approach of "the user application knows best" as to when a lock 
can be acquired, but also covers the "realtime RTAI can't use Linux locks" 
issue; OR don't use patch 2 and go with the patches 0017/0018 approach where 
the master handles this locking (which as I said is not an option for RTAI).



> From: Esben Haabendal [mailto:esbenhaaben...@gmail.com] On Behalf Of 
> Esben Haabendal
> 
> Hi Graeme
> 
> Need to get one thing straight first.
> 
> RTDM user-space is not the same as "plain" Linux user-space.
> I know that you are 100% aware that, but your replies to me seems less clear 
> about it.
> 
> Your 0002-eoe-via-rtdm.patch does not seem usable for "plain" user-space 
> applications, so does not apply to the applications we support.
> 
> >> > These 4 functions are special.  The master should be written (and 
> >> > mostly seems to be) in a fashion that between a
> >> > ecrt_master_activate() and ecrt_master_deactivate() the above 
> >> > calls do not require any locks to synchronize with the master 
> >> > code, except for the EOE thread which uses callbacks.
> >> 
> >> Well, mostly is the same as not in my world.  If they mostly do not 
> >> require any locks, they do require locks.
> >
> > When I say mostly, I mean the vanilla Etherlab version is ok, but 
> > the patch that added ec_ioctl_lock_up() and ec_ioctl_lock_down() 
> > broke this.  I do not use this patch.
> 
> Do you mean that the patches (and I guess you mean 
> 0017-Master-locks-to-avoid-corrupted-datagram-queue.patch and
> 0018-Use-call-back-functions.patch) introduces the need for locking?
> 
> That does not sound right to me.  As I read them (and your writing in this 
> thread), the need for locking is there with and without these patches.  
> Without them, locking needs to be handled at application level.  With the 
> patches, the master tries to handle the locking.
> 
> And that is a clearly a design change, and one which you clearly does not 
> agree with.
> 
> But if you really mean that these patches broke etherlabmaster (and not 
> "just" changed the design), would you please explain it a bit more detailed?
> 

Locking is required if multiple processes / threads will access these 
functions, regardless of which method you prefer.

Vanilla EtherLab approach is, user application knows best when to lock, leave 
it up to them.  But EoE thread is running in the master so allow the user 
application to provide some callbacks so that the EoE thread can obtain the 
lock in the user application and call the appropriate send/receive function.

Gavin's patchset approach (base patches 0017/0018) is to change this to provide 
the locking within the master.
- Problem 1) Patch base/0017 uses master->master_sem which can be held by a 
non-realtime process
- Problem 2) Patch base/0017 uses master->master_sem which can't be used by 
RTAI, so for RTAI it is disabled anyway
- Problem 3) the callbacks are designed to allow the EoE send/receive to be 
synced with your application.  Patch base/0018 repurposes this so that all user 
space ecrt_master_send() calls use the send callback and user space 
ecrt_master_receive() calls use the receive callback.  The send callback is 
designed to call ecrt_master_send_ext() specifically for EoE processing, not 
ecrt_master_send().  The receive callback is designed to call 
ecrt_master_receive() so no real problem there.

Note: ecrt_master_send_ext() will queue any external datagrams and then call 
ecrt_master_send().  The drawbacks to calling ecrt_master_send_ext() rather 
than ecrt_master_send() from your main send/receive loops are:
1) If you have a very fast and tight send/receive loop this may add extra 
overhead you don't want
2) You can get extra jitter between calling ecrt_master_application_time() and 
the frame going onto the wire


> >> As for the EOE thread, I might be overlooking something obvious.  
> >> But how are you supposed to use callbacks when using the library API?
> >> 
> >> As far as I read the code, if you are using EoE and not RTDM, you 
> >> will always use the standard ec_master_internal_send_cb() and
> >> ec_master_internal_receive_cb() callbacks.  See ec_ioctl_activate().
> >
> > If you do not set EoE callbacks (with ecrt_master_callbacks()) there 
> > will be NO callbacks once the master goes active.  (It will not use
> > ec_master_internal_send_cb() and ec_master_internal_receive_cb().  
> > It will not start an EoE processing thread.  You will get a "No EoE 
> > processing because of missing callbacks!" message in your system 
> > log.)
> 
> In ec_ioctl_activate() there is:
> 
> #ifndef EC_IOCTL_RTDM
>     ecrt_master_callbacks(master, ec_master_internal_send_cb,
>             ec_master_internal_receive_cb, master); #endif
> 
> So at least for (non RTDM) user-space applications, that activate the 
> master with ioctl, EoE callbacks are always set to
> ec_master_internal_send_cb() and ec_master_internal_receive_cb().
> 

Sorry, I misread your comment and the code.  I'm using RTDM so don't notice 
this.

For RTAI/Xenomi the internal callbacks are not allowed as they use 
master->io_sem which is a non-RTAI compatible semaphore.  The reason for patch 
base/0018 is to allow master->io_sem to be shared for the user space 
send/receive calls and the EoE thread (which will be created as the callbacks 
are defined).  But it will break if your application also uses a kernel 
process, as it will not be using the master->io_sem semaphore (not likely I 
know).

The above ecrt_master_callbacks() in ec_ioctl_activate() was added way back in 
2012 (rev 2433) when RTDM became more fully supported.  I don't know what 
behaviour it had before that but that is probably irrelevant as it has been 
like this for so long.


> > If your application is in kernel space then you can specify callback 
> > functions and then use application level locking.  If your 
> > application is in user space, then you cannot specify callback 
> > functions as the kernel cannot call back into user space.  This is why I 
> > created my latest patch "0002-eoe-via-rtdm.patch".
> > This patch lets you create your own EoE processing thread in your 
> > own application.  This of course also lets you use application level 
> > locking.
> 
> But only for RTDM.  The functions added are guarded by
> 
> #if !defined(__KERNEL__) && defined(EC_RTDM) && (EC_EOE)
> 
> which means that they do not apply when using non-RTDM user-space.
> 

Correct.


> >> And with them, I don't see how you are supposed to do locking at 
> >> the application level.  I will not be able to specify application 
> >> level locks that are used by both application and eoe, unless they 
> >> are implemented directly in the master (kernel).
> >> 
> >> And with ec_master_eoe_thread() running inside master, you really 
> >> need to do locking to synchronize access to master->datagram_queue 
> >> (i.e. the io_sem lock).  If not, you will have race conditions 
> >> between eoe and 3 of the 4
> >> functions:
> >
> > The io_sem lock is only used by the master when it is idle.  Once 
> > you activate the master it is up to your application to provide the locking.
> 
> Ok.  But I dare to say that it is an open question if that is an design for 
> etherlabmaster.  I believe it has both pros and cons.
> 
> Pro: Application developers can implement synchronization as they find most 
> optimal for them.
> 
> Cons: Application developers need to figure out and implement synchronization 
> without much (any) help from the etherlabmaster code.
> And getting synchronization is IMHO something that is often tricky to get 
> right (at least for many developers).
> 

Generally agree, but RTAI can't use EtherCAT master locking in hard realtime, 
so it needs an alternate solution anyway.  And the vanilla Etherlab design is 
to use application level locking (except for the ecrt_master_callbacks() call 
in ec_ioctl_activate() which I suspect snuck through).


> >>    ecrt_master_send()
> >>    ecrt_master_receive()
> >>    ecrt_domain_queue()
> >> 
> >> So we really do not to do locking in at least these 3 when using EOE.
> >> 
> >> Or maybe the whole master->datagram_queue should be refactored to 
> >> something that can be synchronized safely in a lock-free way?
> >
> > The point of the callback functions is so that you can make sure 
> > your application does not call any of ecrt_master_receive(), 
> > ecrt_master_send(), ecrt_master_send_ext(), ecrt_domain_process(),
> > ecrt_domain_queue() for a given master at the same time.  If you 
> > don't use EoE you don't need the callbacks, but you still need to 
> > protect ecrt_master_receive(), ecrt_master_send(),ecrt_domain_process(), 
> > ecrt_domain_queue().
> 
> Which I personally see as a trivial omission in the etherlabmaster design 
> (requiring all API users to spend time and code writing the same code for 
> protect the API from itself).
> 

Not a trivial omission as locking can't be done in the master for RTAI for hard 
realtime calls, so it need an alternative.  Also I suspect most simple 
applications only require one send/receive loop, so don't require locking 
anyway.  I've only needed to start adding locks since starting to use EoE.


> >> > And the reason it uses callbacks is because only your application 
> >> > knows if it is appropriate to allow the EOE thread to call
> >> > ecrt_master_recieve() and ecrt_master_send_ext() etc at any 
> >> > particular time.
> >> 
> >> I see what you mean.  But I just don't see how it is possible to 
> >> accomplish this for user space applications.
> >> 
> >
> > Exactly why I create my patch "0002-eoe-via-rtdm.patch". 
> 
> And again, which does not extend to non-RTDM user-space.
> 

Agreed, but it could be, but that will require application level locks for 
those that don't currently use them (for non-RTDM user-space apps).


> >> > However, if your application has multiple send/receive loops then 
> >> > they need to be synchronized with each other (see next comment).
> >> > There are a few more functions such as the distributed clock 
> >> > calls that are also in the send/receive loops.  They also do not 
> >> > require ethercat level locks as they should be safe to call 
> >> > between the activate and deactivate.  They also do not need 
> >> > application level locking as they should really only be called by 
> >> > one send/receive loop per master.  All other ethercat function calls 
> >> > should not be in a hard realtime context.
> >> 
> >> Ok.  But you imply a struct rule for how to design applications 
> >> doing EtherCAT communication: "There can only be one send/receive 
> >> loop per master".
> >> 
> >> That might (is not in my case) accepted by all application developers.
> >> I need to continue support for applications that use multiple 
> >> send/receive loops for multiple EtherCAT domains, each with different 
> >> cycle time.
> >> 
> >> One solution could be to create a single send/receive loop per 
> >> master, and then let the multiple application loops communicate 
> >> with that instead of directly with the master.  That might actually 
> >> be a really good idea.  But, if that is done, I think it should fit 
> >> very well as a general feature of the EtherCAT master.  It should 
> >> be possible to implement without any application specific code.
> >
> > That's not what I said.  You can have multiple send/receive loops 
> > per master, but it is up to your application to make sure they don't 
> > call any of the ecrt functions I listed at the top at the same time 
> > (per master).  If your application is a single process with multiple 
> > threads you can use an application semaphore (for example), but if 
> > it is multiple processes you will need a named semaphore (or similar) that 
> > all of the processes share.
> 
> For the single process, multiple threads application, it is not that bad.  It 
> is requring all such application developers to do the same over and over 
> again, wasting time and introducing the same bugs again and again.
> 
> The multiple process world can be different though.  You basically end up 
> with a new EtherCAT API.  A combination of the etherlabmaster API and a 
> custom named semaphore API.  Without this, applications will not work 
> properly together.  Why not include such a feature directly in 
> etherlabmaster?  Without it, I think we are making the user-space 
> applications (non-RTDM) into a second-class citizen.
> 
> It might not matter to you, but as this how we use etherlabmaster, it matters 
> to me.
> 

It always maters and that's why the patch is optional.

For RTAI, application level locks is the only allowable case.  For non-RTAI 
apps either approach is possible, but most would probably prefer the built in 
EtherCAT master level locks (until they aren't powerful enough anymore).


> >> The final "problem" is as you say a design question.  Should 
> >> etherlabmaster support synchronization of access from multiple 
> >> threads/processes, or should this be left entirely to the 
> >> application developer.  But if
> >> master->datagram_queue is changed to something lock-free, this "problem" 
> >> master->will
> >> likely be solved also.
> >> 
> >> >> > However, if they are being called from multiple processes 
> >> >> > (rather than
> >> >> > threads) then you need to use something like a named semaphore 
> >> >> > so that all processes share the same lock.  Of course if you 
> >> >> > are using callbacks (for EoE) you are probably doing that anyway.
> >> >> 
> >> >> You easily have We have multiple processes.  Even with just a 
> >> >> single application, you have EtherCAT-OP, EtherCAT-EoE and the 
> >> >> application.
> >> >> All using some of the same shared data structures.  Throwing 
> >> >> more multiple application just adds to that, but I think it is 
> >> >> critical enough with just a single application.
> >> >
> >> > Even if you have multiple processes (rather than threads), it is 
> >> > your design decision as to which process takes priority and 
> >> > whether a particular send/receive loop should wait a bit even if 
> >> > it could get the lock now.  You can only do that if your 
> >> > application controls the locking of these functions.
> >> 
> >> While that sounds reasonable, I believe it is not entirely true.  
> >> You could write a layer responsible for handling this, which gets 
> >> information from the
> >> application(s) for how to deal with this.  Think of it like a kind 
> >> of scheduler.
> >> 
> >> Example from an application I have seen.  You have two applications 
> >> (processes), each driving it's own EtherCAT domain.  One 
> >> application/domain is running every 2 ms, the other 
> >> application/domain every 10 ms.  In addition, EoE slaves are also 
> >> used.  For each application, you could specify the priority, the 
> >> cycle time and the start time.  EoE should also be given a priority.
> >> This EtherCAT "scheduler" would then be able to decide which EtherCAT 
> >> communication to do and when.
> >> 
> >> So while I agree that it might be most obvious to implement the 
> >> decission of when to send/receive directly in the application, I 
> >> think it could also be implemented in a more generic way.
> >
> > You don't need the EtherCAT master to do that scheduling.  That is 
> > already available in Linux / RTAI or whatever you use.
> 
> What do you mean?
> 

In Linux you can give processes various priorities (via its nice value).  For 
RTAI you can also give each task a priority, but I think it is more strict that 
standard Linux where higher priority tasks will fully run before lower priority 
tasks.  If you have a single CPU (or single allocated CPU to all your EtherCAT 
processes) then the higher priority process will get more CPU and be woken up 
before a lower priority process if they are both waiting on a lock.  Even with 
multiple CPU's the higher priority task should always get a lock before a lower 
priority task.

So you don't need the EtherCAT master to have a scheduler on top of this.


> > But once one of the above ecrt function calls is in progress you 
> > need to ensure no other process / thread will call one at the same time.
> 
> Yes, that is your preferred design (that **I** need to ensure that).  I 
> prefer a design where the master will do that for me (actually the 
> application developers using our platform).
> 

For RTAI its not preferred, it's required.  Can't use Linux locks in RTAI hard 
realtime calls.


> >> >> > They should probably also be turned off for RTAI / Xenomi in 
> >> >> > general and as I said above use application level locking.
> >> >> >
> >> >> > You can pass --enable-rtdm when compiling to enable RTDM (and 
> >> >> > compile rtdm-ioctl.o).
> >> >> 
> >> >> Passing --enable-rtdm to ./configure will define EC_RTDM macro 
> >> >> and enable the automake conditional ENABLE_RTDM.
> >> >> 
> >> >> This will trigger Kbuild to build rtdm-ioctl.o (from 
> >> >> rtdm-ioctl.c), and this will be done with EC_IOCTL_RTDM macro defined.
> >> >> 
> >> >> But ioctl.c will be compiled as always, and that is without 
> >> >> EC_IOCTL_RTDM defined.  Was it supposed to be defined?  If so, 
> >> >> it should be easy to fix, but someone should definitely do some 
> >> >> real testing of it.
> >> >
> >> > I think the locks should be disabled in both "rtdm-ioctl.c" and 
> >> > "ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this.
> >> 
> >> I don't see how it could ever be disabled for ioctl.c with the current 
> >> code.
> >
> > By not using the patch that added them.
> 
> Of-course.  But back to the same discussion again.
> 
> >> > Further to that I don't think they should be there at all.  
> >> > Simple applications have one send/receive loop so don't need locks.
> >> > Applications with multiple send/receive loops or EOE need to 
> >> > control their own locking for optimal results anyway.
> >> 
> >> Again, I don't fully agree with you on that.
> >> 
> >> But more importantly, it is not possible for user-space 
> >> applications in combination with EoE, due to inability to set application 
> >> callbacks.
> >> 
> >> > Also, having these lock functions use master_sem the send/receive 
> >> > functions block unnecessarily with non-realtime ethercat function 
> >> > calls.  At a minimum they should be locking on their own semaphore.
> >> 
> >> True.  I have patches for fixing that.  The master_sem is 
> >> definitely a no-go for real-time.
> >
> > So, I believe the following two patches are the problem:
> > - base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
> > - base/0018-Use-call-back-functions.patch
> 
> Well, that depends on how you look at it.  Removing those patches removes the 
> problem from etherlabmaster.  But then I just have to do exactly the same 
> thing on top of etherlabmaster, and I will be back to square one.
> 
> > My new EoE patch rolls back the changes from 
> > "base/0018-Use-call-back-functions.patch" and I use EC_IOCTL_RTDM so 
> > ignore the changes in 
> > "base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch".
> 
> Ok.  I think you could have been a bit more explicit about that roll-back.
> 
> /Esben
>


Regards,
Graeme.
_______________________________________________
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

0001-eoe-addif-delif-tools.patch
Description: 0001-eoe-addif-delif-tools.patch

0002-eoe-via-rtdm.patch
Description: 0002-eoe-via-rtdm.patch

_______________________________________________
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Reply via email to