Adar Dembo created KUDU-1397:
--------------------------------

             Summary: Allow building safely with custom toolchains
                 Key: KUDU-1397
                 URL: https://issues.apache.org/jira/browse/KUDU-1397
             Project: Kudu
          Issue Type: Bug
          Components: build
    Affects Versions: 0.8.0
            Reporter: Adar Dembo


Casey uncovered several issues when building Kudu with the Impala toolchain; 
this report attempts to capture them.

The first and most important issue was a random SIGSEGV during a flush:
{noformat}
(gdb) bt
#0 0x0000000000e82540 in kudu::CopyCellData<kudu::ColumnBlockCell, 
kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, dst_arena=0x0)
at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:79
#1 0x0000000000e80e33 in kudu::CopyCell<kudu::ColumnBlockCell, 
kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, dst_arena=0x0)
at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:103
#2 0x0000000000e7f647 in kudu::CopyRow<kudu::RowBlockRow, kudu::RowBlockRow, 
kudu::Arena> (src_row=..., dst_row=0x7ff9c637d870, dst_arena=0x0)
at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:119
#3  0x0000000000e76773 in kudu::tablet::FlushCompactionInput (input=0x3894f00, 
snap=..., out=0x7ff9c637dbf0)
    at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/compaction.cc:768
#4  0x0000000000e23f5a in kudu::tablet::Tablet::DoCompactionOrFlush 
(this=0x395a840, input=..., mrs_being_flushed=0)
    at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:1221
#5  0x0000000000e202b2 in kudu::tablet::Tablet::FlushInternal (this=0x395a840, 
input=..., old_ms=...) at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:744
#6  0x0000000000e1f8f6 in kudu::tablet::Tablet::FlushUnlocked (this=0x395a840) 
at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:678
#7  0x0000000000f1b3a3 in kudu::tablet::FlushMRSOp::Perform (this=0x38b9340) at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet_peer_mm_ops.cc:127
#8  0x0000000000ea19d7 in kudu::MaintenanceManager::LaunchOp (this=0x3904360, 
op=0x38b9340) at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/maintenance_manager.cc:360
#9  0x0000000000ea6502 in boost::_mfi::mf1<void, kudu::MaintenanceManager, 
kudu::MaintenanceOp*>::operator() (this=0x3d492a0, p=0x3904360, a1=0x38b9340)
    at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
#10 0x0000000000ea6163 in 
boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, 
boost::_bi::value<kudu::MaintenanceOp*> >::operator()<boost::_mfi::mf1<void, 
kudu::MaintenanceManager, kudu::MaintenanceOp*>, boost::_bi::list0> 
(this=0x3d492b0, f=..., a=...) at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
#11 0x0000000000ea5bed in boost::_bi::bind_t<void, boost::_mfi::mf1<void, 
kudu::MaintenanceManager, kudu::MaintenanceOp*>, 
boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, 
boost::_bi::value<kudu::MaintenanceOp*> > >::operator() (this=0x3d492a0) at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
#12 0x0000000000ea57ec in 
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>, 
boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, 
boost::_bi::value<kudu::MaintenanceOp*> > >, void>::invoke 
(function_obj_ptr=...) at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
#13 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3c01838) 
at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
#14 0x0000000001d73aa4 in kudu::FunctionRunnable::Run (this=0x3c01830) at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:47
#15 0x0000000001d73062 in kudu::ThreadPool::DispatchThread (this=0x38c8340, 
permanent=true) at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:321
#16 0x0000000001d76740 in boost::_mfi::mf1<void, kudu::ThreadPool, 
bool>::operator() (this=0x38f2d60, p=0x38c8340, a1=true)
    at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
#17 0x0000000001d76375 in 
boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> 
>::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, 
boost::_bi::list0> (this=0x38f2d70, f=...,
    a=...) at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
#18 0x0000000001d75eb7 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, 
kudu::ThreadPool, bool>, 
boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> 
> >::operator() (this=0x38f2d60)
    at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
#19 0x0000000001d759e9 in 
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
boost::_mfi::mf1<void, kudu::ThreadPool, bool>, 
boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> 
> >, void>::invoke (function_obj_ptr=...) at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
#20 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3918028) 
at 
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
#21 0x0000000001d6ba4d in kudu::Thread::SuperviseThread (arg=0x3918000) at 
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/thread.cc:580
#22 0x00007ff9c7bfadc5 in start_thread () from /lib64/libpthread.so.0
#23 0x00007ff9c6aca21d in clone () from /lib64/libc.so.6
{noformat}

Todd traced this to a build issue with codegen. Specifically, when using our 
thirdparty clang to convert precompiled.cc into LLVM IR, we expect that it's 
using the same libstdc++ used by the rest of the Kudu build. It turns out 
there's no such guarantee, and depending on the version discrepancy, there may 
be a [variety of 
issues|https://gcc.gnu.org/wiki/Cxx11AbiCompatibility#ABI_Changes], including 
at least one alignment change that could result in the kind of corruption that 
Casey is seeing.

Let's walk through the various scenarios at play:
# When building Kudu on a platform whose system libstdc++ supports C\+\+11, 
libstdc++ is expected to be found in */usr* regardless of the chosen compiler, 
be it the system's gcc, clang, or thirdparty's clang.
# On el6, we call {{scl enable devtoolset-3}} before building Kudu. This puts a 
special build of gcc 4.9.2 on the PATH whose libstdc++ comes from 
*/opt/rh/devtoolset-3/usr* rather than from the system itself. To avoid 
discrepancies, we patch thirdparty clang to use that same path when searching 
for headers and libraries, so we end up with the same libstdc++ for Kudu as for 
emitted LLVM IR.
# On OSX, C\+\+ supports comes by the way of libc\+\+, with a location deep 
within XCode. This location is built into the system clang, which is also the 
compiler used to build Kudu. We don't patch thirdparty clang as on el6, so it 
can't find libc++ by default. However, Kudu adds {{-cxx-isystem <this XCode 
path>}} during the codegen build. In this way, the libc++ used in emitting LLVM 
IR is the same as what's used in the rest of Kudu.
# Building with the Impala toolchain is similar to the el6 case except without 
the patch to thirdparty's clang. Nor can it be patched in the same way; the 
toolchain location varies from system to system. Without the patch, 
thirdparty's clang ends up using the system's libstdc++, which isn't guaranteed 
to be the same as the version in the toolchain, and can lead to the issues 
described above. This needs to be addressed.

Separately, Casey ran into a build-time issue when building Kudu with the 
Impala toolchain on a platform that doesn't provide Python 2.7 (I think it was 
an el6 VM). On these platforms, Kudu builds its own Python 2.7 before building 
LLVM, as the latter depends on the former to build. The Python build failed 
with the following:
{noformat}
17:22:35 
/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/bin/gcc
 -pthread -mno-avx2 
-Wl,-rpath,/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64,-rpath,'RIGIN/../lib64',-rpath,'RIGIN/../lib'
 
-L/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64
 -Xlinker -export-dynamic -o python \
17:22:35                        Modules/python.o \
17:22:35                        libpython2.7.a -lpthread -ldl  -lutil   -lm  
17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tmpnam':
17:22:35 
/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7631:
 warning: the use of `tmpnam_r' is dangerous, better use `mkstemp'
17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tempnam':
17:22:35 
/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7578:
 warning: the use of `tempnam' is dangerous, better use `mkstemp'
17:22:35 ./python -E -S -m sysconfig --generate-posix-vars ;\
17:22:35        if test $? -ne 0 ; then \
17:22:35                echo "generate-posix-vars failed" ; \
17:22:35                rm -f ./pybuilddir.txt ; \
17:22:35                exit 1 ; \
17:22:35        fi
17:22:35 Traceback (most recent call last):
17:22:35   File "./setup.py", line 33, in <module>
17:22:35     COMPILED_WITH_PYDEBUG = ('--with-pydebug' in 
sysconfig.get_config_var("CONFIG_ARGS"))
17:22:35 TypeError: argument of type 'NoneType' is not iterable
17:22:35 make: *** [sharedmods] Error 1
{noformat}

I investigated this briefly; there's something about the combination of the 
Python build logic and the environment variables emitted by the toolchain that 
causes CONFIG_ARGS to not get used stored properly by sysconfig. 

For now Casey has worked around this second issue by forcing the build of Kudu 
to use Python 2.7 from the Impala toolchain, but we should get to the bottom of 
this second issue as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to