[ 
https://issues.apache.org/jira/browse/KUDU-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated KUDU-1397:
-------------------------------------
    Target Version/s: Backlog  (was: 1.5.0)

> Allow building safely with custom toolchains
> --------------------------------------------
>
>                 Key: KUDU-1397
>                 URL: https://issues.apache.org/jira/browse/KUDU-1397
>             Project: Kudu
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 0.8.0
>            Reporter: Adar Dembo
>
> Casey uncovered several issues when building Kudu with the Impala toolchain; 
> this report attempts to capture them.
> The first and most important issue was a random SIGSEGV during a flush:
> {noformat}
> (gdb) bt
> #0 0x0000000000e82540 in kudu::CopyCellData<kudu::ColumnBlockCell, 
> kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, 
> dst_arena=0x0)
> at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:79
> #1 0x0000000000e80e33 in kudu::CopyCell<kudu::ColumnBlockCell, 
> kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, 
> dst_arena=0x0)
> at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:103
> #2 0x0000000000e7f647 in kudu::CopyRow<kudu::RowBlockRow, kudu::RowBlockRow, 
> kudu::Arena> (src_row=..., dst_row=0x7ff9c637d870, dst_arena=0x0)
> at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:119
> #3  0x0000000000e76773 in kudu::tablet::FlushCompactionInput 
> (input=0x3894f00, snap=..., out=0x7ff9c637dbf0)
>     at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/compaction.cc:768
> #4  0x0000000000e23f5a in kudu::tablet::Tablet::DoCompactionOrFlush 
> (this=0x395a840, input=..., mrs_being_flushed=0)
>     at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:1221
> #5  0x0000000000e202b2 in kudu::tablet::Tablet::FlushInternal 
> (this=0x395a840, input=..., old_ms=...) at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:744
> #6  0x0000000000e1f8f6 in kudu::tablet::Tablet::FlushUnlocked 
> (this=0x395a840) at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:678
> #7  0x0000000000f1b3a3 in kudu::tablet::FlushMRSOp::Perform (this=0x38b9340) 
> at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet_peer_mm_ops.cc:127
> #8  0x0000000000ea19d7 in kudu::MaintenanceManager::LaunchOp (this=0x3904360, 
> op=0x38b9340) at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/maintenance_manager.cc:360
> #9  0x0000000000ea6502 in boost::_mfi::mf1<void, kudu::MaintenanceManager, 
> kudu::MaintenanceOp*>::operator() (this=0x3d492a0, p=0x3904360, a1=0x38b9340)
>     at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
> #10 0x0000000000ea6163 in 
> boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, 
> boost::_bi::value<kudu::MaintenanceOp*> >::operator()<boost::_mfi::mf1<void, 
> kudu::MaintenanceManager, kudu::MaintenanceOp*>, boost::_bi::list0> 
> (this=0x3d492b0, f=..., a=...) at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
> #11 0x0000000000ea5bed in boost::_bi::bind_t<void, boost::_mfi::mf1<void, 
> kudu::MaintenanceManager, kudu::MaintenanceOp*>, 
> boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, 
> boost::_bi::value<kudu::MaintenanceOp*> > >::operator() (this=0x3d492a0) at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
> #12 0x0000000000ea57ec in 
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
> boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>, 
> boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, 
> boost::_bi::value<kudu::MaintenanceOp*> > >, void>::invoke 
> (function_obj_ptr=...) at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
> #13 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3c01838) 
> at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
> #14 0x0000000001d73aa4 in kudu::FunctionRunnable::Run (this=0x3c01830) at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:47
> #15 0x0000000001d73062 in kudu::ThreadPool::DispatchThread (this=0x38c8340, 
> permanent=true) at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:321
> #16 0x0000000001d76740 in boost::_mfi::mf1<void, kudu::ThreadPool, 
> bool>::operator() (this=0x38f2d60, p=0x38c8340, a1=true)
>     at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
> #17 0x0000000001d76375 in 
> boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, 
> boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, 
> kudu::ThreadPool, bool>, boost::_bi::list0> (this=0x38f2d70, f=...,
>     a=...) at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
> #18 0x0000000001d75eb7 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, 
> kudu::ThreadPool, bool>, 
> boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, 
> boost::_bi::value<bool> > >::operator() (this=0x38f2d60)
>     at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
> #19 0x0000000001d759e9 in 
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
> boost::_mfi::mf1<void, kudu::ThreadPool, bool>, 
> boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, 
> boost::_bi::value<bool> > >, void>::invoke (function_obj_ptr=...) at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
> #20 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3918028) 
> at 
> /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
> #21 0x0000000001d6ba4d in kudu::Thread::SuperviseThread (arg=0x3918000) at 
> /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/thread.cc:580
> #22 0x00007ff9c7bfadc5 in start_thread () from /lib64/libpthread.so.0
> #23 0x00007ff9c6aca21d in clone () from /lib64/libc.so.6
> {noformat}
> Todd traced this to a build issue with codegen. Specifically, when using our 
> thirdparty clang to convert precompiled.cc into LLVM IR, we expect that it's 
> using the same libstdc++ used by the rest of the Kudu build. It turns out 
> there's no such guarantee, and depending on the version discrepancy, there 
> may be a [variety of 
> issues|https://gcc.gnu.org/wiki/Cxx11AbiCompatibility#ABI_Changes], including 
> at least one alignment change that could result in the kind of corruption 
> that Casey is seeing.
> Let's walk through the various scenarios at play:
> # When building Kudu on a platform whose system libstdc++ supports C\+\+11, 
> libstdc++ is expected to be found in */usr* regardless of the chosen 
> compiler, be it the system's gcc, clang, or thirdparty's clang.
> # On el6, we call {{scl enable devtoolset-3}} before building Kudu. This puts 
> a special build of gcc 4.9.2 on the PATH whose libstdc++ comes from 
> */opt/rh/devtoolset-3/usr* rather than from the system itself. To avoid 
> discrepancies, we patch thirdparty clang to use that same path when searching 
> for headers and libraries, so we end up with the same libstdc++ for Kudu as 
> for emitted LLVM IR.
> # On OSX, C\+\+ supports comes by the way of libc\+\+, with a location deep 
> within XCode. This location is built into the system clang, which is also the 
> compiler used to build Kudu. We don't patch thirdparty clang as on el6, so it 
> can't find libc++ by default. However, Kudu adds {{-cxx-isystem <this XCode 
> path>}} during the codegen build. In this way, the libc++ used in emitting 
> LLVM IR is the same as what's used in the rest of Kudu.
> # Building with the Impala toolchain is similar to the el6 case except 
> without the patch to thirdparty's clang. Nor can it be patched in the same 
> way; the toolchain location varies from system to system. Without the patch, 
> thirdparty's clang ends up using the system's libstdc++, which isn't 
> guaranteed to be the same as the version in the toolchain, and can lead to 
> the issues described above. This needs to be addressed.
> Separately, Casey ran into a build-time issue when building Kudu with the 
> Impala toolchain on a platform that doesn't provide Python 2.7 (I think it 
> was an el6 VM). On these platforms, Kudu builds its own Python 2.7 before 
> building LLVM, as the latter depends on the former to build. The Python build 
> failed with the following:
> {noformat}
> 17:22:35 
> /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/bin/gcc
>  -pthread -mno-avx2 
> -Wl,-rpath,/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64,-rpath,'RIGIN/../lib64',-rpath,'RIGIN/../lib'
>  
> -L/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64
>  -Xlinker -export-dynamic -o python \
> 17:22:35                      Modules/python.o \
> 17:22:35                      libpython2.7.a -lpthread -ldl  -lutil   -lm  
> 17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tmpnam':
> 17:22:35 
> /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7631:
>  warning: the use of `tmpnam_r' is dangerous, better use `mkstemp'
> 17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tempnam':
> 17:22:35 
> /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7578:
>  warning: the use of `tempnam' is dangerous, better use `mkstemp'
> 17:22:35 ./python -E -S -m sysconfig --generate-posix-vars ;\
> 17:22:35      if test $? -ne 0 ; then \
> 17:22:35              echo "generate-posix-vars failed" ; \
> 17:22:35              rm -f ./pybuilddir.txt ; \
> 17:22:35              exit 1 ; \
> 17:22:35      fi
> 17:22:35 Traceback (most recent call last):
> 17:22:35   File "./setup.py", line 33, in <module>
> 17:22:35     COMPILED_WITH_PYDEBUG = ('--with-pydebug' in 
> sysconfig.get_config_var("CONFIG_ARGS"))
> 17:22:35 TypeError: argument of type 'NoneType' is not iterable
> 17:22:35 make: *** [sharedmods] Error 1
> {noformat}
> I investigated this briefly; there's something about the combination of the 
> Python build logic and the environment variables emitted by the toolchain 
> that causes CONFIG_ARGS to not get used stored properly by sysconfig. 
> For now Casey has worked around this second issue by forcing the build of 
> Kudu to use Python 2.7 from the Impala toolchain, but we should get to the 
> bottom of this second issue as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to