[jira] [Commented] (PHOENIX-2940) Remove STATS RPCs from rowlock

Nick Dimiduk (JIRA) Thu, 26 May 2016 14:02:30 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302923#comment-15302923
 ]


Nick Dimiduk commented on PHOENIX-2940:
---------------------------------------

Thanks for picking this one up [~elserj].

I like where this proposal is going [~jamestaylor]. I have a couple 
questions/observations:

bq. do not cache stats on the server side at all

I'm not sure if this is good or not. Presumably we'll want to have access to 
statistics from within the coprocessors to facilitate local decisions during 
query execution. I started to ask [~maryannxue] about these kinds of 
requirements in her Calcite efforts. Maybe she can comment?

bq. {{ if (tenantId == null) {...} }}

I noticed this conditional yesterday and was rather startled. All the SFDC 
clusters are using the multy-tenant features, which means you're never using 
table stats? Is that because the stats are not maintained at the granularity of 
tenant? Or is it the case that some queries, which cover all tenants, are still 
using this code path?

bq.  Introduce a scheduled timer in ConnectionQueryServicesImpl that queries 
the SYSTEM.STATS table

Yes, I like this approach. STATS updates are async/background operations 
anyway, so too should the consumers' view of them.

> Remove STATS RPCs from rowlock
> ------------------------------
>
>                 Key: PHOENIX-2940
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2940
>             Project: Phoenix
>          Issue Type: Improvement
>         Environment: HDP 2.3 + Apache Phoenix 4.6.0
>            Reporter: Nick Dimiduk
>            Assignee: Josh Elser
>
> We have an unfortunate situation wherein we potentially execute many RPCs 
> while holding a row lock. This is problem is discussed in detail on the user 
> list thread ["Write path blocked by MetaDataEndpoint acquiring region 
> lock"|http://search-hadoop.com/m/9UY0h2qRaBt6Tnaz1&subj=Write+path+blocked+by+MetaDataEndpoint+acquiring+region+lock].
>  During some situations, the 
> [MetaDataEndpoint|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L492]
>  coprocessor will attempt to refresh it's view of the schema definitions and 
> statistics. This involves [taking a 
> rowlock|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L2862],
>  executing a scan against the [local 
> region|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L542],
>  and then a scan against a [potentially 
> remote|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L964]
>  statistics table.
> This issue is apparently exacerbated by the use of user-provided timestamps 
> (in my case, the use of the ROW_TIMESTAMP feature, or perhaps as in 
> PHOENIX-2607). When combined with other issues (PHOENIX-2939), we end up with 
> total gridlock in our handler threads -- everyone queued behind the rowlock, 
> scanning and rescanning SYSTEM.STATS. Because this happens in the 
> MetaDataEndpoint, the means by which all clients refresh their knowledge of 
> schema, gridlock in that RS can effectively stop all forward progress on the 
> cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2940) Remove STATS RPCs from rowlock

Reply via email to