Quanlong Huang created IMPALA-13378:
---------------------------------------
Summary: Impalad crash in RowDescriptor::InitTupleIdxMap()
Key: IMPALA-13378
URL: https://issues.apache.org/jira/browse/IMPALA-13378
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Quanlong Huang
We saw a crash in RowDescriptor::InitTupleIdxMap() showing that the crash
address is 0x0
{noformat}
0 impalad!impala::RowDescriptor::InitTupleIdxMap() [descriptors.cc : 432 +
0x3]
1 impalad!impala::RowDescriptor::RowDescriptor(impala::DescriptorTbl const&,
std::vector<int, std::allocator<int> > const&, std::vector<bool,
std::allocator<bool> > const&) [descriptors.cc : 401 + 0x8]
2 impalad!impala::PlanNode::Init(impala::TPlanNode const&,
impala::FragmentState*) [exec-node.cc : 79 + 0x20]
3 impalad!impala::ScanPlanNode::Init(impala::TPlanNode const&,
impala::FragmentState*) [scan-node.cc : 102 + 0x8]
4 impalad!impala::HdfsScanPlanNode::Init(impala::TPlanNode const&,
impala::FragmentState*) [hdfs-scan-node-base.cc : 156 + 0xc]
5 impalad!impala::PlanNode::CreateTreeHelper(impala::FragmentState*,
std::vector<impala::TPlanNode, std::allocator<impala::TPlanNode> > const&,
impala::PlanNode*, int*, impala::PlanNode**) [exec-node.cc : 145 + 0x10]
6 impalad!impala::PlanNode::CreateTree(impala::FragmentState*, impala::TPlan
const&, impala::PlanNode**) [exec-node.cc : 104 + 0x5]
7 impalad!impala::FragmentState::Init() [fragment-state.cc : 84 + 0x10]
8
impalad!impala::FragmentState::CreateFragmentStateMap(impala::TExecPlanFragmentInfo
const&, impala::ExecQueryFInstancesRequestPB const&, impala::QueryState*,
std::unordered_map<int, impala::FragmentState*, std::hash<int>,
std::equal_to<int>, std::allocator<std::pair<int const, impala::FragmentState*>
> >&) [fragment-state.cc : 78 + 0xc]
9 impalad!impala::QueryState::StartFInstances() [query-state.cc : 820 + 0x2e]
{noformat}
The code is
{code:cpp}
428 void RowDescriptor::InitTupleIdxMap() {
429 // find max id
430 TupleId max_id = 0;
431 for (int i = 0; i < tuple_desc_map_.size(); ++i) {
432 max_id = max(tuple_desc_map_[i]->id(), max_id); // <-- Crash here
433 }{code}
It seems 'tuple_desc_map_[i]' is null here. 'tuple_desc_map_' is initialized in
the parent frame:
{code:cpp}
391 RowDescriptor::RowDescriptor(const DescriptorTbl& desc_tbl,
392 const vector<TTupleId>& row_tuples,
393 const vector<bool>& nullable_tuples)
394 : tuple_idx_nullable_map_(nullable_tuples) {
395 DCHECK_EQ(nullable_tuples.size(), row_tuples.size());
396 DCHECK_GT(row_tuples.size(), 0);
397 for (int i = 0; i < row_tuples.size(); ++i) {
398 tuple_desc_map_.push_back(desc_tbl.GetTupleDescriptor(row_tuples[i]));
// <-- init here
399 DCHECK(tuple_desc_map_.back() != NULL);
400 }
401 InitTupleIdxMap();
402 InitHasVarlenSlots();
403 }{code}
We have a DCHECK to make sure the TupleDescriptor pointer is not NULL. But it's
not executed in RELEASE build.
'desc_tbl' comes from TQueryCtx and 'row_tuples' comes from
TExecPlanFragmentInfo. We should verify whether they are consistent before
starting all the fragment instances.
[https://github.com/apache/impala/blob/3e1b10556bc83b0e697b7a2aac411ccad6094563/be/src/service/control-service.cc#L162-L163]
Why they could be inconsistent still need further investigation. We saw the
same query succeeded sometimes and also saw other queries causing a similar
crash, i.e. crash in RowDescriptor::InitTupleIdxMap() but come from
initialization of different PlanNodes.
Might be the same cause of IMPALA-13107. CC [~wzhou], [~MikaelSmith], [~prozsa]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)