[
https://issues.apache.org/jira/browse/IMPALA-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-6291.
-----------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.11.0
IMPALA-6291: disable AVX512 codegen in LLVM
Adds a whitelist of LLVM CPU attributes that I know that we routinely
test Impala with. This excludes the problematic AVX512 attributes as
well as some other flags we don't test with - e.g. AMD-only
instructions, NVM-related instructions, etc. We're unlikely to get
significant benefit from these instruction set extensions without
explicitly using them via instrinsics.
Testing:
Ran core tests on a system with AVX512 support with a prototype patch
that disabled only the AVX512 flags. Added a backend test to make sure
that the whitelisting is working as expected.
Change-Id: Ic7c3ee3e370bafc50d855113485b7e6925f7bf6a
Reviewed-on: http://gerrit.cloudera.org:8080/8802
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins
> Various crashes and incorrect results on CPUs with AVX512
> ---------------------------------------------------------
>
> Key: IMPALA-6291
> URL: https://issues.apache.org/jira/browse/IMPALA-6291
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0,
> Impala 2.10.0, Impala 2.11.0
> Environment: Ubuntu 16.04, M5.4xlarge
> Reporter: Jim Apple
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: correctness, crash
> Fix For: Impala 2.11.0
>
>
> M5 and C5 instances use a different hypervisor than M4 and C4. In EC2 C5 and
> M5 instances, data loading fails. An interesting snippet from the end of an
> impalad log:
> {noformat}
> I1207 04:12:07.922456 19933 coordinator.cc:99] Exec()
> query_id=944ead2f178cf67e:1755131f00000000 stmt=CREATE TABLE
> tmp_orders_string AS
> SELECT STRAIGHT_JOIN
> o_orderkey, o_custkey, o_orderstatus, o_totalprice, o_orderdate,
> o_orderpriority, o_clerk, o_shippriority, o_comment,
> GROUP_CONCAT(
> CONCAT(
> CAST(l_partkey AS STRING), '\005',
> CAST(l_suppkey AS STRING), '\005',
> CAST(l_linenumber AS STRING), '\005',
> CAST(l_quantity AS STRING), '\005',
> CAST(l_extendedprice AS STRING), '\005',
> CAST(l_discount AS STRING), '\005',
> CAST(l_tax AS STRING), '\005',
> CAST(l_returnflag AS STRING), '\005',
> CAST(l_linestatus AS STRING), '\005',
> CAST(l_shipdate AS STRING), '\005',
> CAST(l_commitdate AS STRING), '\005',
> CAST(l_receiptdate AS STRING), '\005',
> CAST(l_shipinstruct AS STRING), '\005',
> CAST(l_shipmode AS STRING), '\005',
> CAST(l_comment AS STRING)
> ), '\004'
> ) AS lineitems_string
> FROM tpch_parquet.lineitem
> INNER JOIN [SHUFFLE] tpch_parquet.orders ON o_orderkey = l_orderkey
> WHERE o_orderkey % 1 = 0
> GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
> ...
> F1207 04:12:08.972215 19953 partitioned-hash-join-node.cc:291] Check failed:
> probe_batch_pos_ == probe_batch_->num_rows() || probe_batch_pos_ == -1
> {noformat}
> The error log shows:
> {noformat}
> F1207 04:12:08.972215 19953 partitioned-hash-join-node.cc:291] Check failed:
> probe_batch_pos_ == probe_batch_->num_rows() || probe_batch_pos_ == -1
> *** Check failure stack trace: ***
> @ 0x3bdcefd google::LogMessage::Fail()
> @ 0x3bde7a2 google::LogMessage::SendToLog()
> @ 0x3bdc8d7 google::LogMessage::Flush()
> @ 0x3bdfe9e google::LogMessageFatal::~LogMessageFatal()
> @ 0x28bd4db impala::PartitionedHashJoinNode::NextProbeRowBatch()
> @ 0x28c1741 impala::PartitionedHashJoinNode::GetNext()
> @ 0x289f71f
> impala::PartitionedAggregationNode::GetRowsStreaming()
> @ 0x289d8d5 impala::PartitionedAggregationNode::GetNext()
> @ 0x1891d1c impala::FragmentInstanceState::ExecInternal()
> @ 0x188f629 impala::FragmentInstanceState::Exec()
> @ 0x1878c0a impala::QueryState::ExecFInstance()
> @ 0x18774cc
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @ 0x1879849
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x17c64ba boost::function0<>::operator()()
> @ 0x1abb5a1 impala::Thread::SuperviseThread()
> @ 0x1ac412c boost::_bi::list4<>::operator()<>()
> @ 0x1ac406f boost::_bi::bind_t<>::operator()()
> @ 0x1ac4032 boost::detail::thread_data<>::run()
> @ 0x2d668ca thread_proxy
> @ 0x7fe9287146ba start_thread
> @ 0x7fe92844a3dd clone
> Picked up JAVA_TOOL_OPTIONS:
> -agentlib:jdwp=transport=dt_socket,address=30002,server=y,suspend=n
> {noformat}
> To reproduce this, start a M5.4xlarge with 250GB space
> {noformat}
> sudo apt-get update
> sudo apt-get install --yes git
> git init ~/Impala
> pushd ~/Impala
> git fetch https://github.com/apache/impala master
> git checkout FETCH_HEAD
> ./bin/bootstrap_development.sh | tee -a $(mktemp -p ~)
> {noformat}
> You might need to fiddle with the default security group; I'm not sure. You
> can test on an M4.4xlarge, since the above script should work there.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)