[ 
https://issues.apache.org/jira/browse/IMPALA-13378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13378.
------------------------------------
     Fix Version/s: Impala 4.5.0
    Target Version: Impala 4.4.2
        Resolution: Fixed

> Impalad crash in RowDescriptor::InitTupleIdxMap()
> -------------------------------------------------
>
>                 Key: IMPALA-13378
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13378
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 4.5.0
>
>
>  We saw a crash in RowDescriptor::InitTupleIdxMap() showing that the crash 
> address is 0x0
> {noformat}
>  0  impalad!impala::RowDescriptor::InitTupleIdxMap() [descriptors.cc : 432 + 
> 0x3]
>  1  impalad!impala::RowDescriptor::RowDescriptor(impala::DescriptorTbl 
> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, 
> std::allocator<bool> > const&) [descriptors.cc : 401 + 0x8]
>  2  impalad!impala::PlanNode::Init(impala::TPlanNode const&, 
> impala::FragmentState*) [exec-node.cc : 79 + 0x20]
>  3  impalad!impala::ScanPlanNode::Init(impala::TPlanNode const&, 
> impala::FragmentState*) [scan-node.cc : 102 + 0x8]
>  4  impalad!impala::HdfsScanPlanNode::Init(impala::TPlanNode const&, 
> impala::FragmentState*) [hdfs-scan-node-base.cc : 156 + 0xc]
>  5  impalad!impala::PlanNode::CreateTreeHelper(impala::FragmentState*, 
> std::vector<impala::TPlanNode, std::allocator<impala::TPlanNode> > const&, 
> impala::PlanNode*, int*, impala::PlanNode**) [exec-node.cc : 145 + 0x10]
>  6  impalad!impala::PlanNode::CreateTree(impala::FragmentState*, 
> impala::TPlan const&, impala::PlanNode**) [exec-node.cc : 104 + 0x5]
>  7  impalad!impala::FragmentState::Init() [fragment-state.cc : 84 + 0x10]
>  8  
> impalad!impala::FragmentState::CreateFragmentStateMap(impala::TExecPlanFragmentInfo
>  const&, impala::ExecQueryFInstancesRequestPB const&, impala::QueryState*, 
> std::unordered_map<int, impala::FragmentState*, std::hash<int>, 
> std::equal_to<int>, std::allocator<std::pair<int const, 
> impala::FragmentState*> > >&) [fragment-state.cc : 78 + 0xc]
>  9  impalad!impala::QueryState::StartFInstances() [query-state.cc : 820 + 
> 0x2e] {noformat}
> The code is
> {code:cpp}
> 428 void RowDescriptor::InitTupleIdxMap() {
> 429   // find max id
> 430   TupleId max_id = 0;
> 431   for (int i = 0; i < tuple_desc_map_.size(); ++i) {
> 432     max_id = max(tuple_desc_map_[i]->id(), max_id); // <-- Crash here
> 433   }{code}
> It seems 'tuple_desc_map_[i]' is null here. 'tuple_desc_map_' is initialized 
> in the parent frame:
> {code:cpp}
> 391 RowDescriptor::RowDescriptor(const DescriptorTbl& desc_tbl,
> 392                              const vector<TTupleId>& row_tuples,
> 393                              const vector<bool>& nullable_tuples)
> 394   : tuple_idx_nullable_map_(nullable_tuples) {
> 395   DCHECK_EQ(nullable_tuples.size(), row_tuples.size());
> 396   DCHECK_GT(row_tuples.size(), 0);
> 397   for (int i = 0; i < row_tuples.size(); ++i) {
> 398     
> tuple_desc_map_.push_back(desc_tbl.GetTupleDescriptor(row_tuples[i])); // <-- 
> init here
> 399     DCHECK(tuple_desc_map_.back() != NULL);
> 400   }
> 401   InitTupleIdxMap();
> 402   InitHasVarlenSlots();
> 403 }{code}
> We have a DCHECK to make sure the TupleDescriptor pointer is not NULL. But 
> it's not executed in RELEASE build.
> 'desc_tbl' comes from TQueryCtx and 'row_tuples' comes from 
> TExecPlanFragmentInfo. We should verify whether they are consistent before 
> starting all the fragment instances.
> [https://github.com/apache/impala/blob/3e1b10556bc83b0e697b7a2aac411ccad6094563/be/src/service/control-service.cc#L162-L163]
> Why they could be inconsistent still need further investigation. We saw the 
> same query succeeded sometimes and also saw other queries causing a similar 
> crash, i.e. crash in RowDescriptor::InitTupleIdxMap() but come from 
> initialization of different PlanNodes.
> Might be the same cause of IMPALA-13107. CC [~wzhou], [~MikaelSmith], 
> [~prozsa]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to