Quanlong Huang created IMPALA-13378:
---------------------------------------

             Summary: Impalad crash in RowDescriptor::InitTupleIdxMap()
                 Key: IMPALA-13378
                 URL: https://issues.apache.org/jira/browse/IMPALA-13378
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Quanlong Huang


 We saw a crash in RowDescriptor::InitTupleIdxMap() showing that the crash 
address is 0x0
{noformat}
 0  impalad!impala::RowDescriptor::InitTupleIdxMap() [descriptors.cc : 432 + 
0x3]
 1  impalad!impala::RowDescriptor::RowDescriptor(impala::DescriptorTbl const&, 
std::vector<int, std::allocator<int> > const&, std::vector<bool, 
std::allocator<bool> > const&) [descriptors.cc : 401 + 0x8]
 2  impalad!impala::PlanNode::Init(impala::TPlanNode const&, 
impala::FragmentState*) [exec-node.cc : 79 + 0x20]
 3  impalad!impala::ScanPlanNode::Init(impala::TPlanNode const&, 
impala::FragmentState*) [scan-node.cc : 102 + 0x8]
 4  impalad!impala::HdfsScanPlanNode::Init(impala::TPlanNode const&, 
impala::FragmentState*) [hdfs-scan-node-base.cc : 156 + 0xc]
 5  impalad!impala::PlanNode::CreateTreeHelper(impala::FragmentState*, 
std::vector<impala::TPlanNode, std::allocator<impala::TPlanNode> > const&, 
impala::PlanNode*, int*, impala::PlanNode**) [exec-node.cc : 145 + 0x10]
 6  impalad!impala::PlanNode::CreateTree(impala::FragmentState*, impala::TPlan 
const&, impala::PlanNode**) [exec-node.cc : 104 + 0x5]
 7  impalad!impala::FragmentState::Init() [fragment-state.cc : 84 + 0x10]
 8  
impalad!impala::FragmentState::CreateFragmentStateMap(impala::TExecPlanFragmentInfo
 const&, impala::ExecQueryFInstancesRequestPB const&, impala::QueryState*, 
std::unordered_map<int, impala::FragmentState*, std::hash<int>, 
std::equal_to<int>, std::allocator<std::pair<int const, impala::FragmentState*> 
> >&) [fragment-state.cc : 78 + 0xc]
 9  impalad!impala::QueryState::StartFInstances() [query-state.cc : 820 + 0x2e] 
{noformat}
The code is
{code:cpp}
428 void RowDescriptor::InitTupleIdxMap() {
429   // find max id
430   TupleId max_id = 0;
431   for (int i = 0; i < tuple_desc_map_.size(); ++i) {
432     max_id = max(tuple_desc_map_[i]->id(), max_id); // <-- Crash here
433   }{code}
It seems 'tuple_desc_map_[i]' is null here. 'tuple_desc_map_' is initialized in 
the parent frame:
{code:cpp}
391 RowDescriptor::RowDescriptor(const DescriptorTbl& desc_tbl,
392                              const vector<TTupleId>& row_tuples,
393                              const vector<bool>& nullable_tuples)
394   : tuple_idx_nullable_map_(nullable_tuples) {
395   DCHECK_EQ(nullable_tuples.size(), row_tuples.size());
396   DCHECK_GT(row_tuples.size(), 0);
397   for (int i = 0; i < row_tuples.size(); ++i) {
398     tuple_desc_map_.push_back(desc_tbl.GetTupleDescriptor(row_tuples[i])); 
// <-- init here
399     DCHECK(tuple_desc_map_.back() != NULL);
400   }
401   InitTupleIdxMap();
402   InitHasVarlenSlots();
403 }{code}
We have a DCHECK to make sure the TupleDescriptor pointer is not NULL. But it's 
not executed in RELEASE build.

'desc_tbl' comes from TQueryCtx and 'row_tuples' comes from 
TExecPlanFragmentInfo. We should verify whether they are consistent before 
starting all the fragment instances.
[https://github.com/apache/impala/blob/3e1b10556bc83b0e697b7a2aac411ccad6094563/be/src/service/control-service.cc#L162-L163]

Why they could be inconsistent still need further investigation. We saw the 
same query succeeded sometimes and also saw other queries causing a similar 
crash, i.e. crash in RowDescriptor::InitTupleIdxMap() but come from 
initialization of different PlanNodes.

Might be the same cause of IMPALA-13107. CC [~wzhou], [~MikaelSmith], [~prozsa]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to