[
https://issues.apache.org/jira/browse/KUDU-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229073#comment-15229073
]
Adar Dembo commented on KUDU-1397:
----------------------------------
I'm aware of two ways to fix the thirdparty clang STL discrepancy, each of
which is a slightly different philosophical approach:
# Ensure we use the right STL header location when using thirdparty clang to
emit LLVM IR (in Kudu's codegen build).
# Ensure thirdparty clang always searches the for STL headers and libraries
based on the compiler used to build it.
The OS X solution follows approach #1, while the el6 solution hews to #2. #1 is
generally simpler because we only need to worry about STL headers (for whatever
reason, the code we emit as LLVM IR doesn't need the libraries, or at least
doesn't need C++11 library symbols), but #2 allows us to use the thirdparty
clang for ASAN/TSAN builds, which can be desirable.
In theory, we should be able to ask the compiler used to build Kudu where its
STL is. This is relatively easy to do when all we're looking for is STL
headers; we can call {{$CXX \-E \-x c\+\+ \- \-v < /dev/null}} and examine the
contents of "#include <...>". If we're looking for both headers and libraries,
it's harder. With gcc, one can call {{gcc \-v}} and parse the "Configured with"
output for \-\-prefix=<...>", but that's only useful as input for clang's
{{--gcc-toolchain}} command line option, or for a clang patch (similar to the
patch we use in el6). Alternatively, we could force users to provide the root
of their toolchain at the time that thirdparty is built, and use it to generate
the appropriate patch for clang (similar to the el6 patch).
> Allow building safely with custom toolchains
> --------------------------------------------
>
> Key: KUDU-1397
> URL: https://issues.apache.org/jira/browse/KUDU-1397
> Project: Kudu
> Issue Type: Bug
> Components: build
> Affects Versions: 0.8.0
> Reporter: Adar Dembo
>
> Casey uncovered several issues when building Kudu with the Impala toolchain;
> this report attempts to capture them.
> The first and most important issue was a random SIGSEGV during a flush:
> {noformat}
> (gdb) bt
> #0 0x0000000000e82540 in kudu::CopyCellData<kudu::ColumnBlockCell,
> kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0,
> dst_arena=0x0)
> at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:79
> #1 0x0000000000e80e33 in kudu::CopyCell<kudu::ColumnBlockCell,
> kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0,
> dst_arena=0x0)
> at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:103
> #2 0x0000000000e7f647 in kudu::CopyRow<kudu::RowBlockRow, kudu::RowBlockRow,
> kudu::Arena> (src_row=..., dst_row=0x7ff9c637d870, dst_arena=0x0)
> at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:119
> #3 0x0000000000e76773 in kudu::tablet::FlushCompactionInput
> (input=0x3894f00, snap=..., out=0x7ff9c637dbf0)
> at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/compaction.cc:768
> #4 0x0000000000e23f5a in kudu::tablet::Tablet::DoCompactionOrFlush
> (this=0x395a840, input=..., mrs_being_flushed=0)
> at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:1221
> #5 0x0000000000e202b2 in kudu::tablet::Tablet::FlushInternal
> (this=0x395a840, input=..., old_ms=...) at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:744
> #6 0x0000000000e1f8f6 in kudu::tablet::Tablet::FlushUnlocked
> (this=0x395a840) at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:678
> #7 0x0000000000f1b3a3 in kudu::tablet::FlushMRSOp::Perform (this=0x38b9340)
> at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet_peer_mm_ops.cc:127
> #8 0x0000000000ea19d7 in kudu::MaintenanceManager::LaunchOp (this=0x3904360,
> op=0x38b9340) at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/maintenance_manager.cc:360
> #9 0x0000000000ea6502 in boost::_mfi::mf1<void, kudu::MaintenanceManager,
> kudu::MaintenanceOp*>::operator() (this=0x3d492a0, p=0x3904360, a1=0x38b9340)
> at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
> #10 0x0000000000ea6163 in
> boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>,
> boost::_bi::value<kudu::MaintenanceOp*> >::operator()<boost::_mfi::mf1<void,
> kudu::MaintenanceManager, kudu::MaintenanceOp*>, boost::_bi::list0>
> (this=0x3d492b0, f=..., a=...) at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
> #11 0x0000000000ea5bed in boost::_bi::bind_t<void, boost::_mfi::mf1<void,
> kudu::MaintenanceManager, kudu::MaintenanceOp*>,
> boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>,
> boost::_bi::value<kudu::MaintenanceOp*> > >::operator() (this=0x3d492a0) at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
> #12 0x0000000000ea57ec in
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
> boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>,
> boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>,
> boost::_bi::value<kudu::MaintenanceOp*> > >, void>::invoke
> (function_obj_ptr=...) at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
> #13 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3c01838)
> at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
> #14 0x0000000001d73aa4 in kudu::FunctionRunnable::Run (this=0x3c01830) at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:47
> #15 0x0000000001d73062 in kudu::ThreadPool::DispatchThread (this=0x38c8340,
> permanent=true) at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:321
> #16 0x0000000001d76740 in boost::_mfi::mf1<void, kudu::ThreadPool,
> bool>::operator() (this=0x38f2d60, p=0x38c8340, a1=true)
> at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
> #17 0x0000000001d76375 in
> boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>,
> boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void,
> kudu::ThreadPool, bool>, boost::_bi::list0> (this=0x38f2d70, f=...,
> a=...) at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
> #18 0x0000000001d75eb7 in boost::_bi::bind_t<void, boost::_mfi::mf1<void,
> kudu::ThreadPool, bool>,
> boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>,
> boost::_bi::value<bool> > >::operator() (this=0x38f2d60)
> at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
> #19 0x0000000001d759e9 in
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
> boost::_mfi::mf1<void, kudu::ThreadPool, bool>,
> boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>,
> boost::_bi::value<bool> > >, void>::invoke (function_obj_ptr=...) at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
> #20 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3918028)
> at
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
> #21 0x0000000001d6ba4d in kudu::Thread::SuperviseThread (arg=0x3918000) at
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/thread.cc:580
> #22 0x00007ff9c7bfadc5 in start_thread () from /lib64/libpthread.so.0
> #23 0x00007ff9c6aca21d in clone () from /lib64/libc.so.6
> {noformat}
> Todd traced this to a build issue with codegen. Specifically, when using our
> thirdparty clang to convert precompiled.cc into LLVM IR, we expect that it's
> using the same libstdc++ used by the rest of the Kudu build. It turns out
> there's no such guarantee, and depending on the version discrepancy, there
> may be a [variety of
> issues|https://gcc.gnu.org/wiki/Cxx11AbiCompatibility#ABI_Changes], including
> at least one alignment change that could result in the kind of corruption
> that Casey is seeing.
> Let's walk through the various scenarios at play:
> # When building Kudu on a platform whose system libstdc++ supports C\+\+11,
> libstdc++ is expected to be found in */usr* regardless of the chosen
> compiler, be it the system's gcc, clang, or thirdparty's clang.
> # On el6, we call {{scl enable devtoolset-3}} before building Kudu. This puts
> a special build of gcc 4.9.2 on the PATH whose libstdc++ comes from
> */opt/rh/devtoolset-3/usr* rather than from the system itself. To avoid
> discrepancies, we patch thirdparty clang to use that same path when searching
> for headers and libraries, so we end up with the same libstdc++ for Kudu as
> for emitted LLVM IR.
> # On OSX, C\+\+ supports comes by the way of libc\+\+, with a location deep
> within XCode. This location is built into the system clang, which is also the
> compiler used to build Kudu. We don't patch thirdparty clang as on el6, so it
> can't find libc++ by default. However, Kudu adds {{-cxx-isystem <this XCode
> path>}} during the codegen build. In this way, the libc++ used in emitting
> LLVM IR is the same as what's used in the rest of Kudu.
> # Building with the Impala toolchain is similar to the el6 case except
> without the patch to thirdparty's clang. Nor can it be patched in the same
> way; the toolchain location varies from system to system. Without the patch,
> thirdparty's clang ends up using the system's libstdc++, which isn't
> guaranteed to be the same as the version in the toolchain, and can lead to
> the issues described above. This needs to be addressed.
> Separately, Casey ran into a build-time issue when building Kudu with the
> Impala toolchain on a platform that doesn't provide Python 2.7 (I think it
> was an el6 VM). On these platforms, Kudu builds its own Python 2.7 before
> building LLVM, as the latter depends on the former to build. The Python build
> failed with the following:
> {noformat}
> 17:22:35
> /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/bin/gcc
> -pthread -mno-avx2
> -Wl,-rpath,/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64,-rpath,'RIGIN/../lib64',-rpath,'RIGIN/../lib'
>
> -L/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64
> -Xlinker -export-dynamic -o python \
> 17:22:35 Modules/python.o \
> 17:22:35 libpython2.7.a -lpthread -ldl -lutil -lm
> 17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tmpnam':
> 17:22:35
> /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7631:
> warning: the use of `tmpnam_r' is dangerous, better use `mkstemp'
> 17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tempnam':
> 17:22:35
> /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7578:
> warning: the use of `tempnam' is dangerous, better use `mkstemp'
> 17:22:35 ./python -E -S -m sysconfig --generate-posix-vars ;\
> 17:22:35 if test $? -ne 0 ; then \
> 17:22:35 echo "generate-posix-vars failed" ; \
> 17:22:35 rm -f ./pybuilddir.txt ; \
> 17:22:35 exit 1 ; \
> 17:22:35 fi
> 17:22:35 Traceback (most recent call last):
> 17:22:35 File "./setup.py", line 33, in <module>
> 17:22:35 COMPILED_WITH_PYDEBUG = ('--with-pydebug' in
> sysconfig.get_config_var("CONFIG_ARGS"))
> 17:22:35 TypeError: argument of type 'NoneType' is not iterable
> 17:22:35 make: *** [sharedmods] Error 1
> {noformat}
> I investigated this briefly; there's something about the combination of the
> Python build logic and the environment variables emitted by the toolchain
> that causes CONFIG_ARGS to not get used stored properly by sysconfig.
> For now Casey has worked around this second issue by forcing the build of
> Kudu to use Python 2.7 from the Impala toolchain, but we should get to the
> bottom of this second issue as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)