Impala recently changed its default compiler from GCC 4.9.2 to GCC 7.5.0. Here is some information about what changed:
1. The native toolchain packages are now accessed via IMPALA_TOOLCHAIN_PACKAGES_HOME and this is now a subdirectory of IMPALA_TOOLCHAIN. For the new GCC 7 code, it is $IMPALA_TOOLCHAIN/toolchain-packages-gcc7.5.0. The older GCC 4.9.2 native toolchain packages are left in place directly under IMPALA_TOOLCHAIN. 2. Since Impala is now using a newer GCC and libstdc++ compared to certain versions of Linux that we support (i.e. Centos 7, Ubuntu 16), it is important that compiled code be able to find the toolchain's version of libstdc++. Otherwise, it may be looking for symbols that are not present in the system library. This generates errors like: undefined symbol: _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE or /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found In the past, LD_LIBRARY_PATH has been exported into the environment with the system libstdc++ first. For example: export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH} It is now recommended to remove this setting. LD_LIBRARY_PATH does not need to be set in the environment, and it will lead to errors if it is set this way. Most scripts in the Impala dev environment run binaries with the right LD_LIBRARY_PATH to find libstdc++. When running a binary directly, the bin/run-jvm-binary.sh wrapper will provide the right settings to get the toolchain libstdc++ on the path. e.g. bin/run-jvm-binary.sh be/build/latest/runtime/io/data-cache-test 3. The impala-python virtualenv used to be located at infra/python/env. It moved to infra/python/env-gcc7.5.0. It should be possible to switch back and forth between branches with and without GCC 7. In case of any issues, please reach out. Thanks, Joe