[ https://issues.apache.org/jira/browse/KUDU-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730818#comment-15730818 ]
Todd Lipcon commented on KUDU-1794: ----------------------------------- Is this mapping to the Impala JIRA https://issues.cloudera.org/browse/IMPALA-4334 ? > kuduScanner 's problem causing impala crash. > -------------------------------------------- > > Key: KUDU-1794 > URL: https://issues.apache.org/jira/browse/KUDU-1794 > Project: Kudu > Issue Type: Bug > Reporter: zhangsong > > Sometimes impalad of my cluster will crash , after study the core file, i > found it is the null pointer of data field in ScanResponsePB causing the > impalad's crash. > So i modified a little in "NextBatch" in client.cc > " > if (data_->data_in_open_) { > // We have data from a previous scan. > VLOG(1) << "Extracting data from scan " << ToString(); > data_->data_in_open_ = false; > auto scan_response_data_ptr = data_->last_response_.release_data(); > if (PREDICT_FALSE(scan_response_data_ptr == nullptr)) { > return Status::Corruption(Substitute("Kudu scanner against $0 is in > open status,but scan resp has no data.Scan query: $1.Remote: $2", > > data_->table_->name(),data_->configuration() > > .spec().ToString(*data_->table_->schema().schema_), > data_->ts_->ToString(), > > data_->last_response_.DebugString())); > " > Also some modifications in impala part of code: > " > if (UNLIKELY(!status.ok())) { > LOG(ERROR) <<"KuduScanner::GetNextScannerBatch ERROR["<< > status.ToString() << "]"; > KUDU_RETURN_IF_ERROR(status, "unable to advance kudu iterator"); > } > " > After these modifications i found these errors in impalad's log: > "E1124 11:46:50.780480 15613 kudu-scanner.cc:422] > KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to > 172.22.99.57:7050 timed out after 180.000s] > " > and > "E1124 11:49:24.171380 16127 kudu-scanner.cc:422] > KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to > 172.22.99.57:7050 timed out after 164.164s: Remote error: Service > unavailable: Scan request on kudu.tserver.TabletServerService from > 172.22.99.57:64537 dropped due to backpressure. The service queue is full; it > has 150 items.] > " > and > "E1124 11:49:24.171378 16128 kudu-scanner.cc:422] > KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to > 172.22.99.57:7050 timed out after 121.593s: Not found: Scanner not found]" > It seems that there are various reason causing the null pointer of data field > of ScanResponsePB , but impalad has no way of knowing them. > May be last_response_.has_more_results() should return false when this > exception happens? -- This message was sent by Atlassian JIRA (v6.3.4#6332)