Hello Jeff, First, thanks for the initiative. I read your e-mail with great interest. It comes at the right time and covers some latent issues.
Here are some comments from a tracing perspective (both kernel and user space). I think that you will find that there is a lot of commonality between profiling and tracing although some stuff is quite specific. [1] To launch or not to launch... As you probably know, instead of using the TCF launcher facility, we went instead for a TCF connector (in RSE) on the host and a TCF Agent on the target node. That agent implements the set of commands we need - configure, control, trace retrieval, etc - using TCF. The agent is started manually. Of course, the degenerate case of "remote target" is "local host". By configuring, we mean enabling/disabling static tracepoints. Note that the LTTng agent has to run as root - at least for kernel tracing - an annoying security issue for us. We hope to alleviate this problem eventually (famous last words), at least for UST. LTTng 2.0 should also help. The RSE/TCF connector allows, on top of tracer control, for navigation of the remote file system, visualization/killing of remote processes, start shell/terminal, etc, from a single view (Remote Systems). In the case where you have multiple targets, we also think it would be nice to have everything centralized. However, the main reason for opting for a connector was that LTTng kernel tracing is not about tracing an ordinary application (obviously): it was not one of our use cases to upload and start an instrumented kernel. In other words, the instrumented code (kernel) being already on the target, there is no need to upload anything, just hook to it and configure it live. Note that this will also be the case for embedded software and server-like applications, another important use case for us. The story takes a new twist when we introduce UST (User Space Tracing): in that case, it makes sense to instrument and cross-compile your code on the host and then upload the binary on the target. After uploading, the binary can be configured and then run. Trace configuration has to happen between uploading and starting the application since, once it runs, it might be a bit late to cherry-pick which tracepoints to enable. Anyway, a launcher definitely seems to be the way to go in that case. In short, I see a need for: - A launcher to upload, configure and start an instrumented application - A scheme to connect to a running process and dynamically select the tracepoints to enable/disable via an agent. This could be initiated by a launcher, in lieu of the RSE/TCF connector scheme, but it would still have to be supplemented by the equivalent of a control view once the connection is established (like the Debug View when you launch a CDT debug session). [2] The launching experience I agree with you that the TCF setup should be pushed - or copied - into Profiling Configurations. When you say that "the TCF Targets page could be pushed into the framework", I assume you are referring to the Linux Tools profiling framework, right? I am not really familiar with that component. My understanding is that it provides launching facility for profiling applications either locally or remotely. I have no issue into integrating tracing in the profiling launch scheme (although I believe that profiling is a special case of tracing :-). For the tracepoints selection, my preferred approach would be similar to Eclipse tracing today: list the available tracepoints in some project config file (e.g. .tracing) that will be used to populate a Tracing tab in the launcher where tracepoints can be selectively enabled (just like a Workbench/Java/JUnit launch today). That should be a good start for tracing from a developer standpoint. Like you also mention, I don't see that it makes immediate sense to deploy the agent using the launcher (unless you are designing an agent, I guess). Anyway, I agree with you that it is probably more sensible to deliver the agent as a package for your target. However, it might make sense to start the agent from the launcher. BTW, I really like the selection of available targets (in the Target tab), thanks to port 1534 I guess. What I would also like would be a way to enquire the discovered agents about their capabilities (profiling, code coverage, etc), version, state, etc. This could be very useful if we ever standardize on a TCF solution for Linux Tools remote stuff. This is where it get trickier: for tracing purposes, I would like the agents to be aware of the running instrumented apps and have the capability of relaying that information, along with the corresponding tracepoints, to the host (so the Tracing tab - where you enable your tracepoints - can be dynamically populated). This could really complicate the agent design... [3] The certificate system I haven't come anywhere close to that part of the code! From what you describe, I see a healthy mix of bugs and enhancement needs :-) Potential bugs: - the ignored alias name - the hard-coded names with wrong capitalization - the non-checking for matching private/public keys Enhancements: - avoid the unnecessary regeneration of certificates at set-up - simultaneous sessions from the same host (your proposed scheme looks good) [4] Port 1534 That the launcher doesn't open the port looks like a shortcoming to me too. [5] TCF as a root daemon Like you, I see major security issues (in the general sense - not only intrusions) with this situation. When you suggest to configure the agent as root but start it under a specified user's account, are you referring to a scheme like mysqld (started by mysql)? That would be cool (not to mention very Linuxy). [6] Environment variables Having no real experience with the TCF launcher yet, I'm afraid I don't fully grasp how annoying this can be. If it is what I suspect, i.e. a way to configure your agent and ultimately control your app, then I see no reason not to preset these before launching, although this might complicate the launching protocol. [7] Thoughts I think you have pretty much covered the launching aspects for profiling. With some additions, I guess that tracing can also be well served although there are still some question marks on my side (namely dynamic control after launching). I also think that this work could be done in the spirit of Andrew's silo-breaking idea - a common agent that supports remote Linux Tools (at least some). It would indeed be very cool to have such an agent that supports Valgrind, GCov, GProf, etc. What extra functionality would be required to easily integrate more tools? It would be great to have comments from the other Linux Toolers on this one. That's it for today (I have a build to test :-) Best Regards, /fc On Wed, May 25, 2011 at 5:52 PM, Jeff Johnston <[email protected]> wrote: > Hi Francois, > > I'd like to start a dialog with you regarding cleaning up the TCF > implementation for remote profiling/tracing. > > The first thing I'd like to tackle is the set-up design. I am primarily > interested in how this can be directed from the client. In the current > design, remote set-up must occur via the Debug Configurations... dialog > under TCF. > > There, one can find a list of targets. After unclicking the use-local > options, the Add... button becomes active for the list of targets. Pressing > this, a set of options are provided, one being the option to set-up an agent > remotely. Selecting this action, one gets a dialog to enter the root > password and a separate userid/password for the remote system. This kicks > off a remote shell that: > > 1. uninstalls remote tcf-agent rpm and installs "most-appropriate" > tcf-agent rpm for remote system > 2. as part of install, runs tcf-agent to create a certificate which is > passed back to the client and then invoked again. A public > certificate copy is performed by both host and remote from each > other. > 3. starts the TCF agent as a root daemon > > There is a bug in the code whereby the certificate names set up are not in > sync with the code that ultimately uses them (i.e. Local.cert vs > local.cert). > > Once the remote shell is done, manual set-up of the agent via environment > variables is still required which is very annoying. > > Here's how I would like to change the mechanism. > > 1. Copy the TCF set-up into Profiling Configurations... The TCF Targets page > could be pushed into the framework so each tool could reuse it. > > 2. I would like to cease the shipping of the rpm over to the remote system. > Instead, I would like to create a new Fedora/RHEL package containing our > TCF agent(s). That way it can be installed from the remote side using the > normal mechanisms. This prevents the client from having to store rpms for > all supported systems and platforms. We can maintain the package for > Fedora/RHEL. > > User's on other systems can pressure their maintainers to add the package or > they can start the agent manually and do the manual specification using the > environment variables as they can today. > > 3. The current certificate system needs fixing. Currently, a certificate > pair gets generated when the rpm is installed which is every time this > remote set-up is called. The files are put in /etc/tcf/ssl/local.cert and > /etc/tcf/ssl/local.priv. On the client side, if the files don't already > exist in the state location of the plug-in, they are copied over to the > client and remotely the tcf agent is asked to regenerate the certificate > again. There is then a swap of the public certificates which are copied to > both sides to have the respective host names where they originated. Each > side has a matched pair of "local.cert" and "local.priv" and each side has > the other's local.cert renamed to be "{hostname}.cert". > > One problem is that the TCFSecurityManager run on the client side has > hardcoded the names "Local.cert" and "Local.priv" for the public and private > key methods. An alias name is passed in as a parameter, but the methods > ignore it. The hard-coded names don't even match up with the files > originally swapped by the set-up wizard as they use "local" instead of > "Local" (I got around this simply by renaming). For multiple clients using > the same target, I think it will work because the system chooses to copy > over the public keys on both sides. Each client will try to do the set-up > (unless it knows the target is already running and the client wishes to set > up the magic environment variables). This means a new certificate gets > regenerated each time. Old clients won't have a valid public certificate > that matches the remote target private key anymore, but that doesn't appear > to matter. In channel_tcp.c there is a function called > certificate_verify_callback() which is set up for verification. It compares > the current certificate to each one ending in .cert in the TCF certificates > directory. There does not appear to be a check for the private key matching > the public one. > > I do see an issue with multiple Eclipse sessions from the same host. The > problem is that currently the certificates are stored in the state location > (per workspace). If not already set, the tcf agent is asked to regenerate. > The problem is that the remote target will end up copying this new public > certificate using the host name of the client. This will trash any previous > set-up that was performed from another workspace on the same client. The > answer to me would be to use the user location and to encode the username > into the certificate name. Perhaps you can some other thoughts on this. > > 4. The tcf agent requires port 1534 to be open. The current set-up doesn't > ensure this. I think that it should take care of this. > > 5. The tcf agent runs as a root daemon. This is dangerous since the tcf > agent has capability of doing just about anything (starting processes, > moving files around). Does Lttng require root access? Valgrind does not, > but OProfile for instance, does. I have recently worked on setting up > policy kit to run /usr/bin/opcontrol (using pkexec and having a policy > file). Now comes the tricky part. One sets policy based on whether it is > the active session, an inactive session, or "any". AFAICT, using ssh > requires "any" access. Policy kit can be set up to allow, deny, or to > require self authorization or admin authorization. It will challenge when > needed, but I don't know what happens if we are using it remotely via > Eclipse (i.e. how it will challenge and whether we can respond to that > challenge). > > Ideally, I would like to set up the agent using root access, but to start it > under a specified user's account. Any root access would be done via > pkexec() and policy set up, preferably with no challenge required following > the remote set-up which does query for a userid, password, and root > password. It would make sense for the root access issues to be tcf agent > commands so that the client doesn't have to prefix them with pkexec. > > 6. Once set-up, create the target with environment variables set-up so the > remote target just shows up in any Target list. > > > Anything I missed? Thoughts? > > -- Jeff J. > > > > > -- Francois _______________________________________________ linuxtools-dev mailing list [email protected] https://dev.eclipse.org/mailman/listinfo/linuxtools-dev
