On Tue, 7 May 2013 09:43:30 -0500 Serge Hallyn <serge.hal...@ubuntu.com> wrote:
> Quoting Dwight Engen (dwight.en...@oracle.com): > > On Mon, 6 May 2013 19:08:08 -0500 > > Serge Hallyn <serge.hal...@ubuntu.com> wrote: > > > > > Quoting Dwight Engen (dwight.en...@oracle.com): > > > > On Mon, 6 May 2013 15:31:14 -0500 > > > > Serge Hallyn <serge.hal...@ubuntu.com> wrote: > > > > > > > > > Quoting Dwight Engen (dwight.en...@oracle.com): > > > > > > On Mon, 6 May 2013 13:06:43 -0400 > > > > > > Dwight Engen <dwight.en...@oracle.com> wrote: > > > > > > > > > > > > > Hi Çağlar, > > > > > > > > > > > > > > Thanks for the test program, I can sort of recreate it > > > > > > > here with that, although once I lxc-stop them all, > > > > > > > lxc-monitord does go away. I put a debug in lxc_wait() to > > > > > > > see that the client always close the fd to the monitord > > > > > > > and they all were closed so I'm not sure why lxc-monitord > > > > > > > isn't seeing the accepted fd coming back from epoll to > > > > > > > close. Still investigating... > > > > > > > > > > > > Okay, so I debugged this and the problem is basically down > > > > > > to lxc not being thread aware. With your test program we get > > > > > > multiple threads in lxcapi_start() simultaneously in the > > > > > > daemonize case. One of them forks while another one has the > > > > > > client fd to the monitor open and thus the fd gets duped by > > > > > > the fork and that is the client fd that holds lxc-monitord > > > > > > open until the container shuts down. > > > > > > > > > > > > Çağlar you could try out the following patch, it essentially > > > > > > serializes container startup from a thread perspective. I > > > > > > haven't tested it thoroughly, but it did fix the problem > > > > > > here. Right now lxc doesn't support threaded use, so you > > > > > > may run into other things as well. Depending on our stance > > > > > > on thread support in lxc, you may need to do the > > > > > > serialization in the threaded app. I guess another > > > > > > alternative is that initially we could just thread > > > > > > serialize at the API (big lxc lock). > > > > > > > > > > It sounds like lxcapi_start should be locking c->slock? The > > > > > priv_lock is to protect the struct lxccontainer in memory (for > > > > > when you do c = lxc_container_new(); then clone a new thread), > > > > > while the slock is to protect the container data. It's being > > > > > taken at create, info, destroy, etc, but not at start. > > > > > > > > Hi Serge, thanks for pointing those out, lxc is more thread > > > > aware than I realized :) Unfortunately I don't think either > > > > lock will help here as both these locks are per-container and > > > > the data we need to synchronize is per-calling process (ie. > > > > which fd's are open at time of fork). To put it another way, > > > > this problem happens even when each c is different. > > > > > > Right you are. I was so busy worrying about how to protect the > > > on-disk container and in-memory container_struct I wasn't thinking > > > about the shared fdtable. > > > > > > Should we have a single mutex around all fd modifications? What > > > else? > > > > I think that will certainly work here, I haven't considered other > > places. Do we need to think about sighand and umask? Would you > > prefer > > Yeah we should. > > (And I need to decide how to fix the fact that c->slock is just called > the container name, regardless the lxcpath) > > > to use the semaphores lxc already has (ie lxc_newlock() with no > > name) > > Actually we're doing different things there, so I'm happy to use > pthread_mutex, unless it'll cause trouble for people using different > toolchains (i.e. android/bionic). The semaphores we're using are to > protect the container_struct itself, and to protect the on-disk > container data. For protecting the task data itself, pthread_mutex > seems perfect. (And no name is needed.) > > > over pthread_mutex? Since Çağlar verified the test fix worked for > > him also I'd like to make a real fix. > > Should I push the patch from yesterday as a first step, or would you > prefer to do all at once with one patch? I'm fine either way. Hi Serge, after testing this a bit more and the fact that you're fine using pthread_mutex, I think its okay as is since it fixes the immediate problem. If/When we find more process things that need thread locking we can rename the lock and make it more generic. If you want me to submit a regular git patch with sign off and all just let me know. > thanks, > -serge > > PS - I'm in meetings all week (a very rare thing for me), so apologies > if I'm slow in responding. ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. This 200-page book is written by three acclaimed leaders in the field. The early access version is available now. Download your free book today! http://p.sf.net/sfu/neotech_d2d_may _______________________________________________ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel