Re: [linuxtools-dev] TCF and remote profiling/tracing

Francois Chouinard Mon, 30 May 2011 17:32:45 -0700

Hello Jeff,

First, thanks for the initiative. I read your e-mail with great
interest. It comes at the right time and covers some latent issues.

Here are some comments from a tracing perspective (both kernel and
user space). I think that you will find that there is a lot of
commonality between profiling and tracing although some stuff is quite
specific.

[1] To launch or not to launch...

As you probably know, instead of using the TCF launcher facility, we
went instead for a TCF connector (in RSE) on the host and a TCF Agent
on the target node. That agent implements the set of commands we need
- configure, control, trace retrieval, etc - using TCF. The agent is
started manually. Of course, the degenerate case of "remote target" is
"local host". By configuring, we mean enabling/disabling static
tracepoints.

Note that the LTTng agent has to run as root - at least for kernel
tracing - an annoying security issue for us. We hope to alleviate this
problem eventually (famous last words), at least for UST. LTTng 2.0
should also help.

The RSE/TCF connector allows, on top of tracer control, for navigation
of the remote file system, visualization/killing of remote processes,
start shell/terminal, etc, from a single view (Remote Systems). In the
case where you have multiple targets, we also think it would be nice
to have everything centralized.

However, the main reason for opting for a connector was that LTTng
kernel tracing is not about tracing an ordinary application
(obviously): it was not one of our use cases to upload and start an
instrumented kernel. In other words, the instrumented code (kernel)
being already on the target, there is no need to upload anything, just
hook to it and configure it live. Note that this will also be the case
for embedded software and server-like applications, another important
use case for us.

The story takes a new twist when we introduce UST (User Space
Tracing): in that case, it makes sense to instrument and cross-compile
your code on the host and then upload the binary on the target. After
uploading, the binary can be configured and then run. Trace
configuration has to happen between uploading and starting the
application since, once it runs, it might be a bit late to cherry-pick
which tracepoints to enable. Anyway, a launcher definitely seems to be
the way to go in that case.

In short, I see a need for:
- A launcher to upload, configure and start an instrumented application
- A scheme to connect to a running process and dynamically select the
tracepoints to enable/disable via an agent. This could be initiated by
a launcher, in lieu of the RSE/TCF connector scheme, but it would
still have to be supplemented by the equivalent of a control view once
the connection is established (like the Debug View when you launch a
CDT debug session).

[2] The launching experience

I agree with you that the TCF setup should be pushed - or copied -
into Profiling Configurations. When you say that "the TCF Targets page
could be pushed into the framework", I assume you are referring to the
Linux Tools profiling framework, right?

I am not really familiar with that component. My understanding is that
it provides launching facility for profiling applications either
locally or remotely. I have no issue into integrating tracing in the
profiling launch scheme (although I believe that profiling is a
special case of tracing :-).

For the tracepoints selection, my preferred approach would be similar
to Eclipse tracing today: list the available tracepoints in some
project config file (e.g. .tracing) that will be used to populate a
Tracing tab in the launcher where tracepoints can be selectively
enabled (just like a Workbench/Java/JUnit launch today). That should
be a good start for tracing from a developer standpoint.

Like you also mention, I don't see that it makes immediate sense to
deploy the agent using the launcher (unless you are designing an
agent, I guess). Anyway, I agree with you that it is probably more
sensible to deliver the agent as a package for your target. However,
it might make sense to start the agent from the launcher.

BTW, I really like the selection of available targets (in the Target
tab), thanks to port 1534 I guess. What I would also like would be a
way to enquire the discovered agents about their capabilities
(profiling, code coverage, etc), version, state, etc. This could be
very useful if we ever standardize on a TCF solution for Linux Tools
remote stuff.

This is where it get trickier: for tracing purposes, I would like the
agents to be aware of the running instrumented apps and have the
capability of relaying that information, along with the corresponding
tracepoints, to the host (so the Tracing tab - where you enable your
tracepoints - can be dynamically populated). This could really
complicate the agent design...

[3] The certificate system

I haven't come anywhere close to that part of the code! From what you
describe, I see a healthy mix of bugs and enhancement needs :-)

Potential bugs:
- the ignored alias name
- the hard-coded names with wrong capitalization
- the non-checking for matching private/public keys

Enhancements:
- avoid the unnecessary regeneration of certificates at set-up
- simultaneous sessions from the same host (your proposed scheme looks good)

[4] Port 1534

That the launcher doesn't open the port looks like a shortcoming to me too.

[5] TCF as a root daemon

Like you, I see major security issues (in the general sense - not only
intrusions) with this situation.

When you suggest to configure the agent as root but start it under a
specified user's account, are you referring to a scheme like mysqld
(started by mysql)? That would be cool (not to mention very Linuxy).

[6] Environment variables

Having no real experience with the TCF launcher yet, I'm afraid I
don't fully grasp how annoying this can be. If it is what I suspect,
i.e. a way to configure your agent and ultimately control your app,
then I see no reason not to preset these before launching, although
this might complicate the launching protocol.

[7] Thoughts

I think you have pretty much covered the launching aspects for
profiling. With some additions, I guess that tracing can also be well
served although there are still some question marks on my side (namely
dynamic control after launching).

I also think that this work could be done in the spirit of Andrew's
silo-breaking idea - a common agent that supports remote Linux Tools
(at least some). It would indeed be very cool to have such an agent
that supports Valgrind, GCov, GProf, etc. What extra functionality
would be required to easily integrate more tools? It would be great to
have comments from the other Linux Toolers on this one.

That's it for today (I have a build to test :-)

Best Regards,
/fc

On Wed, May 25, 2011 at 5:52 PM, Jeff Johnston <[email protected]> wrote:
> Hi Francois,
>
> I'd like to start a dialog with you regarding cleaning up the TCF
> implementation for remote profiling/tracing.
>
> The first thing I'd like to tackle is the set-up design.  I am primarily
> interested in how this can be directed from the client.  In the current
> design, remote set-up must occur via the Debug Configurations... dialog
> under TCF.
>
> There, one can find a list of targets.  After unclicking the use-local
> options, the Add... button becomes active for the list of targets. Pressing
> this, a set of options are provided, one being the option to set-up an agent
> remotely.  Selecting this action, one gets a dialog to enter the root
> password and a separate userid/password for the remote system.  This kicks
> off a remote shell that:
>
> 1. uninstalls remote tcf-agent rpm and installs "most-appropriate"
>   tcf-agent rpm for remote system
> 2. as part of install, runs tcf-agent to create a certificate which is
>   passed back to the client and then invoked again.  A public
>   certificate copy is performed by both host and remote from each
>   other.
> 3. starts the TCF agent as a root daemon
>
> There is a bug in the code whereby the certificate names set up are not in
> sync with the code that ultimately uses them (i.e. Local.cert vs
> local.cert).
>
> Once the remote shell is done, manual set-up of the agent via environment
> variables is still required which is very annoying.
>
> Here's how I would like to change the mechanism.
>
> 1. Copy the TCF set-up into Profiling Configurations... The TCF Targets page
> could be pushed into the framework so each tool could reuse it.
>
> 2. I would like to cease the shipping of the rpm over to the remote system.
>  Instead, I would like to create a new Fedora/RHEL package containing our
> TCF agent(s).  That way it can be installed from the remote side using the
> normal mechanisms.  This prevents the client from having to store rpms for
> all supported systems and platforms.  We can maintain the package for
> Fedora/RHEL.
>
> User's on other systems can pressure their maintainers to add the package or
> they can start the agent manually and do the manual specification using the
> environment variables as they can today.
>
> 3. The current certificate system needs fixing.  Currently, a certificate
> pair gets generated when the rpm is installed which is every time this
> remote set-up is called.  The files are put in /etc/tcf/ssl/local.cert and
> /etc/tcf/ssl/local.priv.  On the client side, if the files don't already
> exist in the state location of the plug-in, they are copied over to the
> client and remotely the tcf agent is asked to regenerate the certificate
> again.  There is then a swap of the public certificates which are copied to
> both sides to have the respective host names where they originated.  Each
> side has a matched pair of "local.cert" and "local.priv" and each side has
> the other's local.cert renamed to be "{hostname}.cert".
>
> One problem is that the TCFSecurityManager run on the client side has
> hardcoded the names "Local.cert" and "Local.priv" for the public and private
> key methods.  An alias name is passed in as a parameter, but the methods
> ignore it.  The hard-coded names don't even match up with the files
> originally swapped by the set-up wizard as they use "local" instead of
> "Local" (I got around this simply by renaming).  For multiple clients using
> the same target, I think it will work because the system chooses to copy
> over the public keys on both sides.  Each client will try to do the set-up
> (unless it knows the target is already running and the client wishes to set
> up the magic environment variables).  This means a new certificate gets
> regenerated each time.  Old clients won't have a valid public certificate
> that matches the remote target private key anymore, but that doesn't appear
> to matter.  In channel_tcp.c there is a function called
> certificate_verify_callback() which is set up for verification.  It compares
> the current certificate to each one ending in .cert in the TCF certificates
> directory.  There does not appear to be a check for the private key matching
> the public one.
>
> I do see an issue with multiple Eclipse sessions from the same host. The
> problem is that currently the certificates are stored in the state location
> (per workspace).  If not already set, the tcf agent is asked to regenerate.
>  The problem is that the remote target will end up copying this new public
> certificate using the host name of the client.  This will trash any previous
> set-up that was performed from another workspace on the same client.  The
> answer to me would be to use the user location and to encode the username
> into the certificate name.  Perhaps you can some other thoughts on this.
>
> 4. The tcf agent requires port 1534 to be open.  The current set-up doesn't
> ensure this.  I think that it should take care of this.
>
> 5. The tcf agent runs as a root daemon.  This is dangerous since the tcf
> agent has capability of doing just about anything (starting processes,
> moving files around).  Does Lttng require root access?  Valgrind does not,
> but OProfile for instance, does.  I have recently worked on setting up
> policy kit to run /usr/bin/opcontrol (using pkexec and having a policy
> file).  Now comes the tricky part.  One sets policy based on whether it is
> the active session, an inactive session, or "any". AFAICT, using ssh
> requires "any" access.  Policy kit can be set up to allow, deny, or to
> require self authorization or admin authorization. It will challenge when
> needed, but I don't know what happens if we are using it remotely via
> Eclipse (i.e. how it will challenge and whether we can respond to that
> challenge).
>
> Ideally, I would like to set up the agent using root access, but to start it
> under a specified user's account.  Any root access would be done via
> pkexec() and policy set up, preferably with no challenge required following
> the remote set-up which does query for a userid, password, and root
> password.  It would make sense for the root access issues to be tcf agent
> commands so that the client doesn't have to prefix them with pkexec.
>
> 6. Once set-up, create the target with environment variables set-up so the
> remote target just shows up in any Target list.
>
>
> Anything I missed?  Thoughts?
>
> -- Jeff J.
>
>
>
>
>

-- 
Francois
_______________________________________________
linuxtools-dev mailing list
[email protected]
https://dev.eclipse.org/mailman/listinfo/linuxtools-dev

Re: [linuxtools-dev] TCF and remote profiling/tracing

Reply via email to