Impala recently changed its default compiler from GCC 4.9.2 to GCC 7.5.0.
Here is some information about what changed:

1. The native toolchain packages are now accessed via
IMPALA_TOOLCHAIN_PACKAGES_HOME and this is now a subdirectory of
IMPALA_TOOLCHAIN. For the new GCC 7 code, it is
$IMPALA_TOOLCHAIN/toolchain-packages-gcc7.5.0. The older GCC 4.9.2 native
toolchain packages are left in place directly under IMPALA_TOOLCHAIN.

2. Since Impala is now using a newer GCC and libstdc++ compared to certain
versions of Linux that we support (i.e. Centos 7, Ubuntu 16), it is
important that compiled code be able to find the toolchain's version of
libstdc++. Otherwise, it may be looking for symbols that are not present in
the system library. This generates errors like:
undefined symbol:
_ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE
or
/usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found

In the past, LD_LIBRARY_PATH has been exported into the environment with
the system libstdc++ first. For example:
export
LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
It is now recommended to remove this setting. LD_LIBRARY_PATH does not need
to be set in the environment, and it will lead to errors if it is set this
way. Most scripts in the Impala dev environment run binaries with the right
LD_LIBRARY_PATH to find libstdc++. When running a binary directly, the
bin/run-jvm-binary.sh wrapper will provide the right settings to get the
toolchain libstdc++ on the path. e.g.
bin/run-jvm-binary.sh be/build/latest/runtime/io/data-cache-test

3. The impala-python virtualenv used to be located at infra/python/env. It
moved to infra/python/env-gcc7.5.0. It should be possible to switch back
and forth between branches with and without GCC 7.

In case of any issues, please reach out.

Thanks,
Joe

Reply via email to