[ 
https://issues.apache.org/jira/browse/IMPALA-11998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Zihao updated IMPALA-11998:
------------------------------
    Description: 
{code:c++}
Status ImpalaServer::GetQueryRecord(const TUniqueId& query_id,
    QueryLogIndex::const_iterator* query_record) {
  lock_guard<mutex> l(query_log_lock_);
  *query_record = query_log_index_.find(query_id); 
  ...
  return Status::OK();
}
{code}
This may cause the caller to access invalid iterators, although the function 
locks query_log_lock_ in the execution, the query_record it provides cannot 
guarantee to be valid, because it is out of the protection of query_log_lock_ 
after returning, if query_log_index_ just deletes the corresponding record at 
this time, then query_record will be an invalid iterator.

There is a very small probability that this issue may cause impalad to crash:
{code:c++}
Stack: [0x00007f5be789f000,0x00007f5be809f000],  sp=0x00007f5be8099a00,  free 
space=8170k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [impalad+0x118b08d]  
impala::ImpalaServer::GetRuntimeProfileOutput(impala::TUniqueId const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const&, impala::TRuntimeProfileFormat::type, 
impala::ImpalaServer::RuntimeProfileOutput*)+0x1dd
C  [impalad+0x1167213]  
impala::ImpalaHttpHandler::QueryProfileHelper(kudu::WebCallbackRegistry::WebRequest
 const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, 
rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, 
rapidjson::CrtAllocator>*, impala::TRuntimeProfileFormat::type)+0x53d
C  [impalad+0x116774a]  
impala::ImpalaHttpHandler::QueryProfileTextHandler(kudu::WebCallbackRegistry::WebRequest
 const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, 
rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, 
rapidjson::CrtAllocator>*)+0x16
C  [impalad+0x115bd0d]  std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > 
apache::thrift::to_string<std::vector<int, std::allocator<int> > 
>(std::vector<std::vector<int, std::allocator<int> >, 
std::allocator<std::vector<int, std::allocator<int> > > > const&)+0x141
C  [impalad+0x14d158b]  impala::Webserver::RenderUrlWithTemplate(sq_connection 
const*, kudu::WebCallbackRegistry::WebRequest const&, 
impala::Webserver::UrlHandler const&, std::__cxx11::basic_stringstream<char, 
std::char_traits<char>, std::allocator<char> >*, impala::ContentType*, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const&)+0x177
C  [impalad+0x14d5423]  impala::Webserver::BeginRequestCallback(sq_connection*, 
sq_request_info*)+0x1c71
C  [impalad+0x14d5d52]  
impala::Webserver::BeginRequestCallbackStatic(sq_connection*)+0x20
C  [impalad+0x14e5a87]  impala::ScanRangesPB::~ScanRangesPB()+0x111
C  [impalad+0x14e7d28]  impala::ScanRangeParamsPB::_InternalSerialize(unsigned 
char*, google::protobuf::io::EpsCopyOutputStream*) const+0x13e
C  [impalad+0x14e83ec]  
impala::PlanFragmentInstanceCtxPB::_InternalSerialize(unsigned char*, 
google::protobuf::io::EpsCopyOutputStream*) const+0x3ce
{code}
To trigger this issue, I looped get the oldest record on the page using a 
script:
{code:python}
#!/usr/bin/env python
import requests
from bs4 import BeautifulSoup
import time
root = "http://localhost:25000/";
queries = root + "queries"
profile = root + "query_profile_plain_text?query_id="
while True:
    response = requests.get(queries)
    soup = BeautifulSoup(response.content, "html.parser")
    details_links = soup.find_all("a", text="Details")
    last_details_link = details_links[-1]
    details_url = last_details_link["href"]
    query_id = details_url[-33:]
    response = requests.get(profile + query_id)
    content = response.content[0:44]
    print content {code}
At the same time, I constantly executed select 1 using another script, To 
increase the probability of triggering, I added a small delay after the 
GetQueryRecord() call. Then it was easy to trigger the crash.

  was:
{code:c++}
Status ImpalaServer::GetQueryRecord(const TUniqueId& query_id,
    QueryLogIndex::const_iterator* query_record) {
  lock_guard<mutex> l(query_log_lock_);
  *query_record = query_log_index_.find(query_id); 
  ...
  return Status::OK();
}
{code}

This may cause the caller to access invalid iterators, although the function 
locks query_log_lock_ in the execution, the query_record it provides cannot 
guarantee to be valid, because it is out of the protection of query_log_lock_ 
after returning, if query_log_index_ just deletes the corresponding record at 
this time, then query_record will be an invalid iterator.


> The iterator provided by ImpalaServer::GetQueryRecord() may become invalid
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-11998
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11998
>             Project: IMPALA
>          Issue Type: Bug
>          Components: be
>    Affects Versions: Impala 4.2.0
>            Reporter: Ye Zihao
>            Assignee: Ye Zihao
>            Priority: Critical
>
> {code:c++}
> Status ImpalaServer::GetQueryRecord(const TUniqueId& query_id,
>     QueryLogIndex::const_iterator* query_record) {
>   lock_guard<mutex> l(query_log_lock_);
>   *query_record = query_log_index_.find(query_id); 
>   ...
>   return Status::OK();
> }
> {code}
> This may cause the caller to access invalid iterators, although the function 
> locks query_log_lock_ in the execution, the query_record it provides cannot 
> guarantee to be valid, because it is out of the protection of query_log_lock_ 
> after returning, if query_log_index_ just deletes the corresponding record at 
> this time, then query_record will be an invalid iterator.
> There is a very small probability that this issue may cause impalad to crash:
> {code:c++}
> Stack: [0x00007f5be789f000,0x00007f5be809f000],  sp=0x00007f5be8099a00,  free 
> space=8170k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x118b08d]  
> impala::ImpalaServer::GetRuntimeProfileOutput(impala::TUniqueId const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, impala::TRuntimeProfileFormat::type, 
> impala::ImpalaServer::RuntimeProfileOutput*)+0x1dd
> C  [impalad+0x1167213]  
> impala::ImpalaHttpHandler::QueryProfileHelper(kudu::WebCallbackRegistry::WebRequest
>  const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, 
> rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, 
> rapidjson::CrtAllocator>*, impala::TRuntimeProfileFormat::type)+0x53d
> C  [impalad+0x116774a]  
> impala::ImpalaHttpHandler::QueryProfileTextHandler(kudu::WebCallbackRegistry::WebRequest
>  const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, 
> rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, 
> rapidjson::CrtAllocator>*)+0x16
> C  [impalad+0x115bd0d]  std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > 
> apache::thrift::to_string<std::vector<int, std::allocator<int> > 
> >(std::vector<std::vector<int, std::allocator<int> >, 
> std::allocator<std::vector<int, std::allocator<int> > > > const&)+0x141
> C  [impalad+0x14d158b]  
> impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
> const&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, 
> std::allocator<char> >*, impala::ContentType*, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&)+0x177
> C  [impalad+0x14d5423]  
> impala::Webserver::BeginRequestCallback(sq_connection*, 
> sq_request_info*)+0x1c71
> C  [impalad+0x14d5d52]  
> impala::Webserver::BeginRequestCallbackStatic(sq_connection*)+0x20
> C  [impalad+0x14e5a87]  impala::ScanRangesPB::~ScanRangesPB()+0x111
> C  [impalad+0x14e7d28]  
> impala::ScanRangeParamsPB::_InternalSerialize(unsigned char*, 
> google::protobuf::io::EpsCopyOutputStream*) const+0x13e
> C  [impalad+0x14e83ec]  
> impala::PlanFragmentInstanceCtxPB::_InternalSerialize(unsigned char*, 
> google::protobuf::io::EpsCopyOutputStream*) const+0x3ce
> {code}
> To trigger this issue, I looped get the oldest record on the page using a 
> script:
> {code:python}
> #!/usr/bin/env python
> import requests
> from bs4 import BeautifulSoup
> import time
> root = "http://localhost:25000/";
> queries = root + "queries"
> profile = root + "query_profile_plain_text?query_id="
> while True:
>     response = requests.get(queries)
>     soup = BeautifulSoup(response.content, "html.parser")
>     details_links = soup.find_all("a", text="Details")
>     last_details_link = details_links[-1]
>     details_url = last_details_link["href"]
>     query_id = details_url[-33:]
>     response = requests.get(profile + query_id)
>     content = response.content[0:44]
>     print content {code}
> At the same time, I constantly executed select 1 using another script, To 
> increase the probability of triggering, I added a small delay after the 
> GetQueryRecord() call. Then it was easy to trigger the crash.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to