[ 
https://issues.apache.org/jira/browse/DRILL-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372296#comment-14372296
 ] 

Parth Chandra commented on DRILL-2509:
--------------------------------------

I think the way to address this is to protect m_pendingRequests with it's own 
mutex. The real issue is that as part of handling a response, the read handler 
sends back an ack _synchronously_. The locking is messed up because obtaining 
the mutex so that m_pendingRequests is correctly protected causes a deadlock.
Longer term (post release 1.0) we can spend some time looking at how to clean 
up the state management.

> Drill Client threading issue with m_pendingRequests
> ---------------------------------------------------
>
>                 Key: DRILL-2509
>                 URL: https://issues.apache.org/jira/browse/DRILL-2509
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Client - C++
>    Affects Versions: 0.7.0
>            Reporter: Norris Lee
>            Assignee: Norris Lee
>             Fix For: 0.9.0
>
>
> 1.       (Thread 1) 1st query receives the last record batch (no data, only 
> contains the query state. Eg. QueryResult_QueryState_COMPLETED) and calls 
> processQueryResult. At this moment, it grabs the lock. Assume 
> m_pendingRequests = 1.
> 2.       (Thread 1) processQueryStatusResult is called and drops 
> m_pendingRequests to 0. The lock is still held at this point.
> 3.       (Thread 1) It returns from processQueryStatusResult, immediately 
> followed by returning from processQueryResult. At this point the lock is 
> released.
> 4.       (Thread 2) SubmitQuery for the next query sees the lock has been 
> freed so it swoops in and grabs the lock. It bumps m_pendingRequests up to 1 
> and sets sendResusts=true since it sees that m_pendingRequests was previously 
> 0, causing it to call getNextResult
> 5.       (Thread 1) After returning from processQueryResult, it sees that 
> m_pendingRequests is now set to 1 so it calls getNextResult
> 6.       So now 2 threads end up calling getNextResult. For whatever reason, 
> the server then sends a record batch with no data, has_rpc_type = false, and 
> rpc_type = 0, leading to throwing a ERR_QRY_INVRPCTYPE error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to