[ 
https://issues.apache.org/jira/browse/PHOENIX-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324764#comment-15324764
 ] 

Josh Elser commented on PHOENIX-2940:
-------------------------------------

bq. I'd get rid of the halfStatsUpdateFreq and expireAfterWrite logic. 

I just thought of an interesting point. If we remove the {{expireAfterFoo()}} 
logic and we fail to fetch the stats for a table, we may keep an instance of 
{{EMPTY_STATS}} in the cache for an indeterminant amount of time. I guess there 
are some means for users to cause this be cleared (entirely, or for a specific 
table)? At least, I see methods on {{ConnectionQueryServices}} but haven't 
traced them back through the parser to see how users can twiddle them. So, 
maybe it's fine to not have any time-based eviction then?

> Remove STATS RPCs from rowlock
> ------------------------------
>
>                 Key: PHOENIX-2940
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2940
>             Project: Phoenix
>          Issue Type: Improvement
>         Environment: HDP 2.3 + Apache Phoenix 4.6.0
>            Reporter: Nick Dimiduk
>            Assignee: Josh Elser
>             Fix For: 4.9.0
>
>         Attachments: PHOENIX-2940.001.patch, PHOENIX-2940.002.patch
>
>
> We have an unfortunate situation wherein we potentially execute many RPCs 
> while holding a row lock. This is problem is discussed in detail on the user 
> list thread ["Write path blocked by MetaDataEndpoint acquiring region 
> lock"|http://search-hadoop.com/m/9UY0h2qRaBt6Tnaz1&subj=Write+path+blocked+by+MetaDataEndpoint+acquiring+region+lock].
>  During some situations, the 
> [MetaDataEndpoint|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L492]
>  coprocessor will attempt to refresh it's view of the schema definitions and 
> statistics. This involves [taking a 
> rowlock|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L2862],
>  executing a scan against the [local 
> region|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L542],
>  and then a scan against a [potentially 
> remote|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L964]
>  statistics table.
> This issue is apparently exacerbated by the use of user-provided timestamps 
> (in my case, the use of the ROW_TIMESTAMP feature, or perhaps as in 
> PHOENIX-2607). When combined with other issues (PHOENIX-2939), we end up with 
> total gridlock in our handler threads -- everyone queued behind the rowlock, 
> scanning and rescanning SYSTEM.STATS. Because this happens in the 
> MetaDataEndpoint, the means by which all clients refresh their knowledge of 
> schema, gridlock in that RS can effectively stop all forward progress on the 
> cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to