On 24 July 2017 at 13:58, Todd Lipcon <[email protected]> wrote:

> On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson <[email protected]> wrote:
>
> > Thanks for the asan pointer - I'll give it a go.
> >
> > My understanding of linking isn't deep, but my working theory has been
> that
> > the complications have been caused by glog getting linked twice - once
> > statically (possibly into libkudu.so), and once dynamically (via everyone
> > else).
> >
>
> In libkudu_client.so, we use a linker script to ensure that we don't leak
> glog/gflags/etc symbols. Those are all listed as 'local' in
> src/kudu/client/symbols.map. We also have a unit test
> 'client_symbol-test.sh' which uses nm to dump the list of symbols and make
> sure that they all non-local non-weak symbols are under the 'kudu::'
> namespace.
>
> So it's possible that something's getting linked twice but I'd be somewhat
> surprised if it's from the Kudu client.
>
>
Good to know, thanks.

ASAN hasn't turned up anything yet - so does anyone have an opinion about
changing Jenkins' linking strategy for now?


> -Todd
>
>
> >
> > I would think that could lead to one or both of the issues you linked to.
> >
> >
> > On 24 July 2017 at 13:39, Todd Lipcon <[email protected]> wrote:
> >
> > > Is it possible that the issue here is due to a "one definition rule"
> > > violation? eg something like
> > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn
> > > eDefinitionRuleViolation
> > > Another similar thing is described here:
> > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn
> > > itializationOrderFiasco
> > >
> > > ASAN with the appropriate flags might help expose if one of the above
> is
> > > related.
> > >
> > > I wonder whether it is a kind of coincidence that it is fine in a
> static
> > > build but causes problems in dynamic, and at some point the static link
> > > order may slightly shift, causing another new subtle bug.
> > >
> > >
> > >
> > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson <[email protected]>
> > wrote:
> > >
> > > > We've started seeing isolated incidences of IMPALA-5702 during GVOs,
> > > where
> > > > a custom cluster test fails by throwing an exception during locale
> > > > handling.
> > > >
> > > > I've been able to reproduce this locally, but only with shared
> linking
> > > > enabled (which makes sense since the issue is symptomatic of a global
> > > c'tor
> > > > not getting called the right number of times).
> > > >
> > > > It's probable that my patch for IMPALA-5659 exposed this (since it
> > > forced a
> > > > more correct linking strategy for thirdparty libraries when dynamic
> > > linking
> > > > was enabled), but it looks to me at first glance like there were
> latent
> > > > dynamic linking bugs that we weren't getting hit by. Fixing
> IMPALA-5702
> > > > will probably take a while, and I don't think we should hold up GVOs
> or
> > > put
> > > > them at risk.
> > > >
> > > > So there are two options:
> > > >
> > > > 1. Revert IMPALA-5659
> > > >
> > > > 2. Switch GVO to static linking
> > > >
> > > > IMPALA-5659 is important to commit the kudu util library, which is
> > needed
> > > > for the KRPC work. Without it, shared linking doesn't work *at all*
> > when
> > > > the kudu util library is committed.
> > > >
> > > > Static linking doesn't take much longer in my unscientific
> > measurements,
> > > > and is closer to how Impala is actually used. In the interest of
> > forward
> > > > progress I'd like to try switching ubuntu-14.04-from-scratch to use
> > > static
> > > > linking while I work on IMPALA-5702.
> > > >
> > > > What does everyone else think?
> > > >
> > > > Henry
> > > >
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to