Hi Paul, Martin,

thank you so much for your answers and patience.

On 05/28/2014 12:38 PM, Woegerer, Paul wrote:
Hi Martin, Hi Gerlando

It tried your approach of removing __attribute__((weak)) and to my
surprise this really seems to be sufficient.

What stuns me is that providing the visibility attribute hidden
implicitly also makes the symbol to be treated as a weak symbol (in the
sense that it can be linked without providing a definition somewhere for
it) by the linker.

I'll have to disagree here, at least I don't have the same feeling.

Here's what I get [with my buggy compiler] if I
a) remove __attribute__((weak)) *AND*
b) rename section("__tracepoints_ptrs") to section("__tracepoints_ptrs_XXX")

(thereby preventing the linker from creating the correct __start__/__stop__ symbols):

/opt/eldk-5.4/powerpc-softfloat/sysroots/powerpc-nf-linux/usr/lib/crtn.o
./.libs/liblttng-ust-runtime.a(lttng-ust-baddr.o):(.got2+0x34): undefined reference to `__start___tracepoints_ptrs' ./.libs/liblttng-ust-runtime.a(lttng-ust-baddr.o):(.got2+0x38): undefined reference to `__stop___tracepoints_ptrs' ./.libs/liblttng-ust-runtime.a(tracef.o):(.got2+0x20): undefined reference to `__start___tracepoints_ptrs' ./.libs/liblttng-ust-runtime.a(tracef.o):(.got2+0x24): undefined reference to `__stop___tracepoints_ptrs'
collect2: error: ld returned 1 exit status

So the hidden symbols are *NOT* weak at all (at least with my buggy compiler). They are just automagically defined by the linker. As a matter of fact, I don't think they should have ever been weak in the first place. We *WANT* those symbols to exist and be well-defined, and we should make sure the linker complies with this requirement, as this is crucial to the correct behaviour of lttng-ust. If we generate an inconsistency like the above and keep the weak attribute, we would end up with code which compiles perfectly but still will not work!

BTW, the only reference I could find to how and why ldd defines those symbols for section __start__/__stop__ is [1], which admittedly states: "I couldn't find any formal documentation for this feature, only a few obscure mailing list references". :-(

So, as to Martin's statement:

>For a long term fix, in my opinion, Yocto/OpenEmbedded needs to fix >their compiler patches.

This is definitely true. However, we should also somehow prevent other people from the frustration we have both been through. So if in the end it turns out that removing __attribute__((weak)) is *NOT* "The Right Thing To Do (TM)", we should at least implement some compiler checking at configure phase, and either bail out with a meaningful message, or define a preprocessor define so we can cope with that within tracepoint.h.

Still, I believe we should "kill the weak". [2] ;-)

What do you think?

Thanks again!
Gerlando

[1] http://mgalgs.github.io/2013/05/10/hacking-your-ELF-for-fun-and-profit.html
[2] No Nazi propaganda intended! ;-)

I thought tagging them weak is required for exactly
that. But apparently this is not the case. It would be interesting if
this treatment of hidden symbols is standardized or if this is just an
implementation-specific behavior of GNU ld.

thanks,
Paul

On 05/28/2014 02:39 AM, Martin Ünsal wrote:
Gerlando, I agree. The __attribute__((weak)) is not strictly necessary
in this case and the problem can be worked around temporarily by
removing this attribute. The reason is that the
__start___tracepoints_ptrs and __stop___tracepoints_ptrs are only
being declared, not defined, at compilation time. There is no need for
a weak definition if they are not defined at all. In fact the
definition is provided automagically by the linker using weak
semantics (i.e. only one definition per ELF binary, shared by all
declarations in all compilation units) regardless of the presence or
absence of weak attribute. Since __start___tracepoints_ptrs is defined
by the linker as the starting address of the _tracepoints_ptrs
section, it would be impossible for it to have anything other than
weak semantics, because it is nonsensical for different object files
in the same ELF binary to have different addresses for the same
executable section.

Although removing __attribute__((weak)) is successful as a workaround,
I would not recommend to upstream it. Since these symbols have weak
semantics, they should have weak declarations. Removing this attribute
could cause a lot of confusion for people reading the code.

I haven't tried Paul's patch but it also seems like a reasonable local
workaround but not the sort of thing to upstream.

For a long term fix, in my opinion, Yocto/OpenEmbedded needs to fix
their compiler patches.

Martin



On Tue, May 27, 2014 at 9:04 AM, Gerlando Falauto
<[email protected] <mailto:[email protected]>>
wrote:

     Hi Paul,

     thanks for your explanation, but I'm more puzzled than ever.
     I'm definitely lacking the appropriate background in both
     terminology and internals, so I tried to figure out how the whole
     magic works by empirical testing.

     Now, when you say:


     > The reason is that you can have the same tracepoint provider be
     USED in
     > several compilation units that will all become part of one and
     the same
     > shared object (or executable).
     >
     > Then all those __start/stop___tracepoints_ptrs references in
     different
     > compilation units should refer to the same
     > __start/stop___tracepoints_ptrs definitions for the shared
     object (or
     > executable) they are part of. This is required because the
     > initialization of the tracepoints will only happen once per shared
     > object (or executable) with the static ctor mechanism also
     defined in
     > tracepoint.h

     Who's responsible for initializating the tracepoints? Isn't it the
     PROVIDER, instead of the user?

     Here's what I understood (or rather, speculated!), so please point
     out where my understanding falls short.

     Tracepoint providers (where TRACEPOINT_DEFINE is defined) are what
     actually implement tracepoints. You can have multiple source
     files, each defining one or more tracepoints. So in the end each
     object file will contain one or more tracepoint pointers within
     its "__tracepoints_ptrs" section (courtesy of the compiler). When
     linking (e.g. towards a shared object), a single section
     __tracepoints_ptrs in the output ELF binary will merge all the
     sections of the above objects, and hold all the pointers as a
     contiguous array. This time, courtesy of the linker, who also
     automagically defines __start___tracepoints_ptrs /
     __stop___tracepoints_ptrs symbols to hold pointers to the
     beginning and end parts of the section.

     Each object file will contain its own __tracepoints__ptrs_init()
     constructor, responsible for registering ALL the tracepoints it
     provides. Actually, we want only ONE constructor per shared object
     to register all the tracepoint pointers provided by the whole
     shared object (contained within
     __start___tracepoints_ptrs/__stop___tracepoints_ptrs). This is
     where, for instance, __tracepoint_ptrs_registered comes into play.
     Multiple invocations of the constructor (one per object file)
     should be avoided and only the first one needs to be performed.
     And this is why __tracepoint_ptrs_registered needs to be weak
     (multiple source files could lead to multiple definitions -- we
     want one and only one per shared object) *AND* hidden (each shared
     object should have its own copy).
     If I remove the weak attribute from __tracepoint_ptrs_registered,
     the linker starts screaming as soon as I compile one of the examples.

     On the other hand,
     __start___tracepoints_ptrs/__stop___tracepoints_ptrs are generated
     by the linker (or so I want to believe!) so only one instance is
     emitted.
     Keeping them hidden prevents the name clash during dynamic
     linking, as the symbol will not be visible from other shared
     objects or binaries.
     But I don't see why they should also be weak.

     As a matter of fact, removing the weak attribute seems to fix my
     problem (as far as I could test).
     What am I missing?

     Thank you again for your patience,
     Gerlando


     On 05/27/2014 04:58 PM, Woegerer, Paul wrote:

         On 05/27/2014 04:41 PM, Gerlando Falauto wrote:

             Hi Paul,

             thank you very much for sharing this.

             I had in the meantime run into the same suggestion by
             Henrik Wallin on a thread opened by Martin
             (https://gcc.gnu.org/ml/gcc-help/2014-05/msg00028.html).
             Further updates from Martin also suggest the issue is
             rather related to
             the OpenEmbedded toolchain.

             I was about to post the "opposite" of your patch, as I
             don't see the
             need to have those symbols as weak instead. In the end,
             doesn't weak
             only allow for a further re-definition? In this case we're
             only
             declaring it as extern, aren't we?
             Definition actually happens by magic, as far as I can tell.
             But please correct me if I got it all wrong.


         It's more complicated.

         You absolutely need those symbol to be declared as:

              .weak   __start___tracepoints_ptrs
              .weak   __stop___tracepoints_ptrs

         *and*

              .hidden __start___tracepoints_ptrs
              .hidden __stop___tracepoints_ptrs

         The reason is that you can have the same tracepoint provider
         be USED in
         several compilation units that will all become part of one and
         the same
         shared object (or executable).

         Then all those __start/stop___tracepoints_ptrs references in
         different
         compilation units should refer to the same
         __start/stop___tracepoints_ptrs definitions for the shared
         object (or
         executable) they are part of. This is required because the
         initialization of the tracepoints will only happen once per shared
         object (or executable) with the static ctor mechanism also
         defined in
         tracepoint.h

         HTH,
         Paul


             Thank you,
             Gerlando

             On 05/27/2014 04:32 PM, Woegerer, Paul wrote:

                 Hi Martin, Hi Gerlando,

                 this sounds a lot like the compiler bug I found
                 recently in Yocto 1.6
                 (reproducible on ARM, x86 and PPC)

                 The problem in my case is that the Yocto generated GCC
                 cross-compiler
                 translates:

                 extern struct tracepoint * const
                 __start___tracepoints_ptrs[]
                       __attribute__((weak, visibility("hidden")));
                 extern struct tracepoint * const
                 __stop___tracepoints_ptrs[]
                       __attribute__((weak, visibility("hidden")));

                 incorrectly to assembly. For these symbols that are
                 declared with

                 __attribute__((weak, visibility("hidden")));

                 that are also defined to be external, in the assembly
                 the following
                 lines are missing:

                 .hidden __stop___tracepoints_ptrs
                 .hidden __start___tracepoints_ptrs

                 This causes __stop___tracepoints_ptrs and
                 __start___tracepoints_ptrs
                 to be further treated as ordinary weak symbols instead of
                 per-shared-object weak symbols.
                 That further will cause  the linker to resolve any
                 such symbols with
                 the first definition of those symbols that it can see
                 (it will not
                 constrain itself to only consider definitions from
                 within the same
                 shared object). The net result is that only one
                 tracepoint provider
                 gets activated (the first one the linker sees) instead
                 of all the
                 tracepoint providers used in various source files.

                 To fix this I use the following lttng-ust workaround
                 (for now):

                 diff --git a/include/lttng/tracepoint.h
                 b/include/lttng/tracepoint.h
                 index 66e2abd..50cef26 100644
                 --- a/include/lttng/tracepoint.h
                 +++ b/include/lttng/tracepoint.h
                 @@ -313,9 +313,11 @@ __tracepoints__destroy(void)
                     * (or for the whole main program).
                     */
                    extern struct tracepoint * const
                 __start___tracepoints_ptrs[]
                 -       __attribute__((weak, visibility("hidden")));
                 +       __attribute__((weak));
                 +asm(".hidden __start___tracepoints_ptrs");
                    extern struct tracepoint * const
                 __stop___tracepoints_ptrs[]
                 -       __attribute__((weak, visibility("hidden")));
                 +       __attribute__((weak));
                 +asm(".hidden __stop___tracepoints_ptrs");

                    /*
                     * When TRACEPOINT_PROBE_DYNAMIC_LINKAGE is
                 defined, we do not emit a


                 Note that this issue is not reproducible with my GCC
                 on host:
                 gcc version 4.8.1 20130909 [gcc-4_8-branch revision
                 202388] (SUSE Linux)
                 and also not with the latest Codebench 2014.05
                 ARM-Linux cross-toolchain.

                 --
                 Best,
                 Paul

                 On 05/27/2014 01:55 PM, Gerlando Falauto wrote:

                     Hi Martin,

                     I have been struggling for a while with this issue
                     (see the whole
                     thread):

                     
http://lists.lttng.org/pipermail/lttng-dev/2014-May/023035.html

                     and landed on the same conclusions as yours (found
                     your message by
                     searching for __start___tracepoints_ptr!).
                     So at least you're not alone!

                     So, did you ever manage to get any of your
                     questions answered:

                             1) Have you run into a problem like this?
                             Is there a known

                     fix/workaround?

                             2) __start____tracepoints_ptrs is declared
                             as extern in tracepoint.h,

                     but it

                             is not defined. This appears to be some
                             sort of undocumented linker

                     magic.

                             
http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
                             is the only

                     reference I

                             could find. Do you know where this
                             behavior is documented or

                     specified (if

                             at all)?
                             3) Do you know why the symbol visibility for
                             __start___tracepoints_ptrs
                             changed between 4.6.4 to 4.7.2?


                     Thank you so much!
                     Gerlando

                     BTW, I'm also running GCC 4.7.2 (lttng-ust is
                     cross-compiled, the test
                     application is natively compiled).

                     On an x86_64 host running either GCC 4.4.6 or
                     4.4.7, the issue is not
                     observed.


                     On 04/30/2014 11:57 PM, Martin Ünsal wrote:

                         Incidentally I also asked for help on the GNU
                         linker-specific part
                         (question 2) here:

                         http://gcc.gnu.org/ml/gcc-help/2014-04/msg00164.html

                         Martin


                         On Wed, Apr 30, 2014 at 2:21 PM, Martin Ünsal
                         <[email protected]
                         <mailto:[email protected]>>
                         wrote:

                             Hi LTTng folks

                             I have a strange problem using LTTng-UST
                             on an ARM based platform. I
                             have
                             done some diagnosis but I am running low
                             on ideas and was hoping for
                             help
                             from the experts. I am using lttng-tools
                             2.2.0, lttng-ust 2.2.0,
                             liburcu
                             0.8.1. I know these are old but upgrading
                             is easier said than done
                             unfortunately. I didn't see anything
                             related to this problem in
                             relnotes,
                             mailing list traffic, or master branch,
                             but I could have missed
                             something.

                             The problem showed up when I switched from
                             GCC 4.6.4 to 4.7.2.
                             Conceptually,
                             the situation is that I have a single
                             executable, call it MyProgram,
                             with
                             two plugins loaded at runtime with
                             dlopen(), lets call them
                             libPlugin1.so
                             and libPlugin2.so. There are three
                             different LTTng-UST tracepoint
                             providers,
                             one each for the executable and the two
                             plugins. With GCC 4.7.2,
                             tracepoints
                             in libPlugin1 stopped working. The
                             tracepoints in MyProgram and in
                             libPlugin2 continue to work correctly.

                             I have established without a doubt that
                             the toolchain upgrade is the
                             cause
                             of the regression.

                             In the debugger, I confirmed that the
                             tracepoint for libPlugin1.so is
                             being
                             executed, but
                             __tracepoint_##provider##___##name.state
                             is always 0
                             even when
                             I enable the tracepoint in lttng-tools. As
                             a result the tracepoint
                             callback
                             is not being invoked when it should be. In
                             MyProgram and
                             libPlugin2.so, the
                             .state variable correctly reflects whether
                             the tracepoint is enabled,
                             and if
                             the tracepoint is enabled, the tracepoint
                             callback is invoked.

                             Next I set a breakpoint in
                             tracepoint_register_lib() and looked at
                             tracepoints_start parameter.

                             1) With GCC 4.6.4 everything is as expected:
                                   a) tracepoint_register_lib() for
                             MyProgram called with
                             MyProgramProvider's
                             __start___tracepoints_ptrs.
                                   b) tracepoint_register_lib() after
                             libPlugin1 dlopen() called
                             with
                             libPlugin1Provider's
                             __start___tracepoints_ptrs
                                   c) tracepoint_register_lib() after
                             libPlugin2 dlopen() called
                             with
                             libPlugin2Provider's __start___tracepoint_ptrs

                             2) With GCC 4.7.2 there is a problem:
                                   a) tracepoint_register_lib() for
                             MyProgram called with
                             MyProgramProvider's
                             __start___tracepoints_ptrs.
                                   b) tracepoint_register_lib() after
                             libPlugin1 dlopen() called
                             with
                             MyProgramProvider's
                             __start___tracepoints_ptrs (!!!! THIS IS WRONG
                             !!!!)
                                   c) tracepoint_register_lib() after
                             libPlugin2 dlopen() called
                             with
                             libPlugin2Provider's __start___tracepoint_ptrs

                             I looked at the symbol table for
                             libPlugin1.so to see if it would
                             shed some
                             light on the problem.

                             1) With GCC 4.6.4:
                             # objdump -t /usr/lib/.debug/libPlugin1.so
                             | grep
                             __start___tracepoints_ptrs
                             00025bb0 l       *ABS* 00000000
                             __start___tracepoints_ptrs
                             # objdump -t /usr/lib/.debug/libPlugin2.so
                             | grep
                             __start___tracepoints_ptrs
                             00041eb4 l       *ABS* 00000000
                             __start___tracepoints_ptrs

                             2) With GCC 4.7.2:
                             # objdump -t /usr/lib/.debug/libPlugin1.so
                             | grep
                             __start___tracepoints_ptrs
                             00025a90 g       __tracepoints_ptrs 00000000
                             __start___tracepoints_ptrs
                             # objdump -t /usr/lib/.debug/libPlugin2.so
                             | grep
                             __start___tracepoints_ptrs
                             00041eb4 g       __tracepoints_ptrs 00000000
                             __start___tracepoints_ptrs

                             My hypothesis at this point is that since
                             __start___tracepoints_ptrs
                             changed
                             from a local to a global symbol, the
                             dynamic loader no longer knows
                             how to
                             select the correct weak symbol. I cannot
                             explain why libPlugin2 still
                             loads
                             its provider correctly, perhaps it is just
                             getting lucky.

                             A few questions come to mind...
                             1) Have you run into a problem like this?
                             Is there a known
                             fix/workaround?
                             2) __start____tracepoints_ptrs is declared
                             as extern in tracepoint.h,
                             but it
                             is not defined. This appears to be some
                             sort of undocumented linker
                             magic.
                             
http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
                             is the only
                             reference I
                             could find. Do you know where this
                             behavior is documented or
                             specified (if
                             at all)?
                             3) Do you know why the symbol visibility for
                             __start___tracepoints_ptrs
                             changed between 4.6.4 to 4.7.2?

                             Thanks for any help. This is a real
                             puzzler for me.

                             Martin


                         _______________________________________________
                         lttng-dev mailing list
                         [email protected]
                         <mailto:[email protected]>
                         
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev



                     _______________________________________________
                     lttng-dev mailing list
                     [email protected]
                     <mailto:[email protected]>
                     http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev










--
Paul Woegerer, SW Development Engineer
Sourcery Analyzer <http://go.mentor.com/sourceryanalyzer>
Mentor Graphics, Embedded Software Division



_______________________________________________
lttng-dev mailing list
[email protected]
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to