Gerlando, I agree. The __attribute__((weak)) is not strictly necessary
in this case and the problem can be worked around temporarily by
removing this attribute. The reason is that the
__start___tracepoints_ptrs and __stop___tracepoints_ptrs are only
being declared, not defined, at compilation time. There is no need for
a weak definition if they are not defined at all. In fact the
definition is provided automagically by the linker using weak
semantics (i.e. only one definition per ELF binary, shared by all
declarations in all compilation units) regardless of the presence or
absence of weak attribute. Since __start___tracepoints_ptrs is defined
by the linker as the starting address of the _tracepoints_ptrs
section, it would be impossible for it to have anything other than
weak semantics, because it is nonsensical for different object files
in the same ELF binary to have different addresses for the same
executable section.
Although removing __attribute__((weak)) is successful as a workaround,
I would not recommend to upstream it. Since these symbols have weak
semantics, they should have weak declarations. Removing this attribute
could cause a lot of confusion for people reading the code.
I haven't tried Paul's patch but it also seems like a reasonable local
workaround but not the sort of thing to upstream.
For a long term fix, in my opinion, Yocto/OpenEmbedded needs to fix
their compiler patches.
Martin
On Tue, May 27, 2014 at 9:04 AM, Gerlando Falauto
<[email protected] <mailto:[email protected]>>
wrote:
Hi Paul,
thanks for your explanation, but I'm more puzzled than ever.
I'm definitely lacking the appropriate background in both
terminology and internals, so I tried to figure out how the whole
magic works by empirical testing.
Now, when you say:
> The reason is that you can have the same tracepoint provider be
USED in
> several compilation units that will all become part of one and
the same
> shared object (or executable).
>
> Then all those __start/stop___tracepoints_ptrs references in
different
> compilation units should refer to the same
> __start/stop___tracepoints_ptrs definitions for the shared
object (or
> executable) they are part of. This is required because the
> initialization of the tracepoints will only happen once per shared
> object (or executable) with the static ctor mechanism also
defined in
> tracepoint.h
Who's responsible for initializating the tracepoints? Isn't it the
PROVIDER, instead of the user?
Here's what I understood (or rather, speculated!), so please point
out where my understanding falls short.
Tracepoint providers (where TRACEPOINT_DEFINE is defined) are what
actually implement tracepoints. You can have multiple source
files, each defining one or more tracepoints. So in the end each
object file will contain one or more tracepoint pointers within
its "__tracepoints_ptrs" section (courtesy of the compiler). When
linking (e.g. towards a shared object), a single section
__tracepoints_ptrs in the output ELF binary will merge all the
sections of the above objects, and hold all the pointers as a
contiguous array. This time, courtesy of the linker, who also
automagically defines __start___tracepoints_ptrs /
__stop___tracepoints_ptrs symbols to hold pointers to the
beginning and end parts of the section.
Each object file will contain its own __tracepoints__ptrs_init()
constructor, responsible for registering ALL the tracepoints it
provides. Actually, we want only ONE constructor per shared object
to register all the tracepoint pointers provided by the whole
shared object (contained within
__start___tracepoints_ptrs/__stop___tracepoints_ptrs). This is
where, for instance, __tracepoint_ptrs_registered comes into play.
Multiple invocations of the constructor (one per object file)
should be avoided and only the first one needs to be performed.
And this is why __tracepoint_ptrs_registered needs to be weak
(multiple source files could lead to multiple definitions -- we
want one and only one per shared object) *AND* hidden (each shared
object should have its own copy).
If I remove the weak attribute from __tracepoint_ptrs_registered,
the linker starts screaming as soon as I compile one of the examples.
On the other hand,
__start___tracepoints_ptrs/__stop___tracepoints_ptrs are generated
by the linker (or so I want to believe!) so only one instance is
emitted.
Keeping them hidden prevents the name clash during dynamic
linking, as the symbol will not be visible from other shared
objects or binaries.
But I don't see why they should also be weak.
As a matter of fact, removing the weak attribute seems to fix my
problem (as far as I could test).
What am I missing?
Thank you again for your patience,
Gerlando
On 05/27/2014 04:58 PM, Woegerer, Paul wrote:
On 05/27/2014 04:41 PM, Gerlando Falauto wrote:
Hi Paul,
thank you very much for sharing this.
I had in the meantime run into the same suggestion by
Henrik Wallin on a thread opened by Martin
(https://gcc.gnu.org/ml/gcc-help/2014-05/msg00028.html).
Further updates from Martin also suggest the issue is
rather related to
the OpenEmbedded toolchain.
I was about to post the "opposite" of your patch, as I
don't see the
need to have those symbols as weak instead. In the end,
doesn't weak
only allow for a further re-definition? In this case we're
only
declaring it as extern, aren't we?
Definition actually happens by magic, as far as I can tell.
But please correct me if I got it all wrong.
It's more complicated.
You absolutely need those symbol to be declared as:
.weak __start___tracepoints_ptrs
.weak __stop___tracepoints_ptrs
*and*
.hidden __start___tracepoints_ptrs
.hidden __stop___tracepoints_ptrs
The reason is that you can have the same tracepoint provider
be USED in
several compilation units that will all become part of one and
the same
shared object (or executable).
Then all those __start/stop___tracepoints_ptrs references in
different
compilation units should refer to the same
__start/stop___tracepoints_ptrs definitions for the shared
object (or
executable) they are part of. This is required because the
initialization of the tracepoints will only happen once per shared
object (or executable) with the static ctor mechanism also
defined in
tracepoint.h
HTH,
Paul
Thank you,
Gerlando
On 05/27/2014 04:32 PM, Woegerer, Paul wrote:
Hi Martin, Hi Gerlando,
this sounds a lot like the compiler bug I found
recently in Yocto 1.6
(reproducible on ARM, x86 and PPC)
The problem in my case is that the Yocto generated GCC
cross-compiler
translates:
extern struct tracepoint * const
__start___tracepoints_ptrs[]
__attribute__((weak, visibility("hidden")));
extern struct tracepoint * const
__stop___tracepoints_ptrs[]
__attribute__((weak, visibility("hidden")));
incorrectly to assembly. For these symbols that are
declared with
__attribute__((weak, visibility("hidden")));
that are also defined to be external, in the assembly
the following
lines are missing:
.hidden __stop___tracepoints_ptrs
.hidden __start___tracepoints_ptrs
This causes __stop___tracepoints_ptrs and
__start___tracepoints_ptrs
to be further treated as ordinary weak symbols instead of
per-shared-object weak symbols.
That further will cause the linker to resolve any
such symbols with
the first definition of those symbols that it can see
(it will not
constrain itself to only consider definitions from
within the same
shared object). The net result is that only one
tracepoint provider
gets activated (the first one the linker sees) instead
of all the
tracepoint providers used in various source files.
To fix this I use the following lttng-ust workaround
(for now):
diff --git a/include/lttng/tracepoint.h
b/include/lttng/tracepoint.h
index 66e2abd..50cef26 100644
--- a/include/lttng/tracepoint.h
+++ b/include/lttng/tracepoint.h
@@ -313,9 +313,11 @@ __tracepoints__destroy(void)
* (or for the whole main program).
*/
extern struct tracepoint * const
__start___tracepoints_ptrs[]
- __attribute__((weak, visibility("hidden")));
+ __attribute__((weak));
+asm(".hidden __start___tracepoints_ptrs");
extern struct tracepoint * const
__stop___tracepoints_ptrs[]
- __attribute__((weak, visibility("hidden")));
+ __attribute__((weak));
+asm(".hidden __stop___tracepoints_ptrs");
/*
* When TRACEPOINT_PROBE_DYNAMIC_LINKAGE is
defined, we do not emit a
Note that this issue is not reproducible with my GCC
on host:
gcc version 4.8.1 20130909 [gcc-4_8-branch revision
202388] (SUSE Linux)
and also not with the latest Codebench 2014.05
ARM-Linux cross-toolchain.
--
Best,
Paul
On 05/27/2014 01:55 PM, Gerlando Falauto wrote:
Hi Martin,
I have been struggling for a while with this issue
(see the whole
thread):
http://lists.lttng.org/pipermail/lttng-dev/2014-May/023035.html
and landed on the same conclusions as yours (found
your message by
searching for __start___tracepoints_ptr!).
So at least you're not alone!
So, did you ever manage to get any of your
questions answered:
1) Have you run into a problem like this?
Is there a known
fix/workaround?
2) __start____tracepoints_ptrs is declared
as extern in tracepoint.h,
but it
is not defined. This appears to be some
sort of undocumented linker
magic.
http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
is the only
reference I
could find. Do you know where this
behavior is documented or
specified (if
at all)?
3) Do you know why the symbol visibility for
__start___tracepoints_ptrs
changed between 4.6.4 to 4.7.2?
Thank you so much!
Gerlando
BTW, I'm also running GCC 4.7.2 (lttng-ust is
cross-compiled, the test
application is natively compiled).
On an x86_64 host running either GCC 4.4.6 or
4.4.7, the issue is not
observed.
On 04/30/2014 11:57 PM, Martin Ünsal wrote:
Incidentally I also asked for help on the GNU
linker-specific part
(question 2) here:
http://gcc.gnu.org/ml/gcc-help/2014-04/msg00164.html
Martin
On Wed, Apr 30, 2014 at 2:21 PM, Martin Ünsal
<[email protected]
<mailto:[email protected]>>
wrote:
Hi LTTng folks
I have a strange problem using LTTng-UST
on an ARM based platform. I
have
done some diagnosis but I am running low
on ideas and was hoping for
help
from the experts. I am using lttng-tools
2.2.0, lttng-ust 2.2.0,
liburcu
0.8.1. I know these are old but upgrading
is easier said than done
unfortunately. I didn't see anything
related to this problem in
relnotes,
mailing list traffic, or master branch,
but I could have missed
something.
The problem showed up when I switched from
GCC 4.6.4 to 4.7.2.
Conceptually,
the situation is that I have a single
executable, call it MyProgram,
with
two plugins loaded at runtime with
dlopen(), lets call them
libPlugin1.so
and libPlugin2.so. There are three
different LTTng-UST tracepoint
providers,
one each for the executable and the two
plugins. With GCC 4.7.2,
tracepoints
in libPlugin1 stopped working. The
tracepoints in MyProgram and in
libPlugin2 continue to work correctly.
I have established without a doubt that
the toolchain upgrade is the
cause
of the regression.
In the debugger, I confirmed that the
tracepoint for libPlugin1.so is
being
executed, but
__tracepoint_##provider##___##name.state
is always 0
even when
I enable the tracepoint in lttng-tools. As
a result the tracepoint
callback
is not being invoked when it should be. In
MyProgram and
libPlugin2.so, the
.state variable correctly reflects whether
the tracepoint is enabled,
and if
the tracepoint is enabled, the tracepoint
callback is invoked.
Next I set a breakpoint in
tracepoint_register_lib() and looked at
tracepoints_start parameter.
1) With GCC 4.6.4 everything is as expected:
a) tracepoint_register_lib() for
MyProgram called with
MyProgramProvider's
__start___tracepoints_ptrs.
b) tracepoint_register_lib() after
libPlugin1 dlopen() called
with
libPlugin1Provider's
__start___tracepoints_ptrs
c) tracepoint_register_lib() after
libPlugin2 dlopen() called
with
libPlugin2Provider's __start___tracepoint_ptrs
2) With GCC 4.7.2 there is a problem:
a) tracepoint_register_lib() for
MyProgram called with
MyProgramProvider's
__start___tracepoints_ptrs.
b) tracepoint_register_lib() after
libPlugin1 dlopen() called
with
MyProgramProvider's
__start___tracepoints_ptrs (!!!! THIS IS WRONG
!!!!)
c) tracepoint_register_lib() after
libPlugin2 dlopen() called
with
libPlugin2Provider's __start___tracepoint_ptrs
I looked at the symbol table for
libPlugin1.so to see if it would
shed some
light on the problem.
1) With GCC 4.6.4:
# objdump -t /usr/lib/.debug/libPlugin1.so
| grep
__start___tracepoints_ptrs
00025bb0 l *ABS* 00000000
__start___tracepoints_ptrs
# objdump -t /usr/lib/.debug/libPlugin2.so
| grep
__start___tracepoints_ptrs
00041eb4 l *ABS* 00000000
__start___tracepoints_ptrs
2) With GCC 4.7.2:
# objdump -t /usr/lib/.debug/libPlugin1.so
| grep
__start___tracepoints_ptrs
00025a90 g __tracepoints_ptrs 00000000
__start___tracepoints_ptrs
# objdump -t /usr/lib/.debug/libPlugin2.so
| grep
__start___tracepoints_ptrs
00041eb4 g __tracepoints_ptrs 00000000
__start___tracepoints_ptrs
My hypothesis at this point is that since
__start___tracepoints_ptrs
changed
from a local to a global symbol, the
dynamic loader no longer knows
how to
select the correct weak symbol. I cannot
explain why libPlugin2 still
loads
its provider correctly, perhaps it is just
getting lucky.
A few questions come to mind...
1) Have you run into a problem like this?
Is there a known
fix/workaround?
2) __start____tracepoints_ptrs is declared
as extern in tracepoint.h,
but it
is not defined. This appears to be some
sort of undocumented linker
magic.
http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
is the only
reference I
could find. Do you know where this
behavior is documented or
specified (if
at all)?
3) Do you know why the symbol visibility for
__start___tracepoints_ptrs
changed between 4.6.4 to 4.7.2?
Thanks for any help. This is a real
puzzler for me.
Martin
_______________________________________________
lttng-dev mailing list
[email protected]
<mailto:[email protected]>
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
[email protected]
<mailto:[email protected]>
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev