[jira] [Commented] (PHOENIX-5239) Send persistent subquery cache to all regionservers

Lars Hofhansl (JIRA) Fri, 17 May 2019 15:19:40 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842691#comment-16842691
 ]


Lars Hofhansl commented on PHOENIX-5239:
----------------------------------------

As a separate query hint this is fine. That way a user can control the behavior,

That said...

I do agree with the sentiment here, that the overall effect on the system might 
not be what you expect. And you might end up hurting your performance or even 
cause outages. (We had our share of this from unexpected behaviors of global 
indexing, as well as UPSERTs and DELETEs blocking servers in tight server 
loops.)

Over time we removed more and more Phoenix logic from the server and put it 
back on the clients (UPSERT/SELECT, DELETE, server-side indexing retries, etc, 
and probably more in the future), and limited things like server side 
sorting... For this very reason. HBase takes great care of managing the 
available memory in the region servers, and it does not expect something else 
also consuming heap memory.

Are we masking other Phoenix defects or gaps with this?
* Perhaps indexing has to be improved instead?
* Or stats?
* Perhaps this is caching that should happen in the Phoenix Query server...? 
That would be a better place to put it.

Long term I'd even like to have a discussion to revert PHOENIX-4666 (it would 
affect performance not correctness).

[~johnp] we're not trying to be assholes - at least I am not. :)
I'm just concerned about the behavior of the system as a whole.


> Send persistent subquery cache to all regionservers
> ---------------------------------------------------
>
>                 Key: PHOENIX-5239
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5239
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: John Phillips
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PHOENIX-4666 introduced a persistent subquery cache that allowed phoenix to 
> cache the results from an expensive subquery (enabled with a 
> {{USE_PERSISTENT_CACHE}} query hint) to speed up subsequent queries.
> More context is available on the PHOENIX-4666 ticket, but a quick example 
> would be a query like:
> {code:java}
> SELECT /*+ USE_PERSISTENT_CACHE */ *
>     FROM table1
>     JOIN (SELECT id_1 FROM large_table WHERE x = 10) expensive_result
>     ON table1.id_1 = expensive_result.id_2
> WHERE table1.id_1 = [some_id]
> {code}
> Where lots of queries are ran, differing only by {{some_id}}. Our usage 
> involves first running one query over phoenix to warm the cache (which takes 
> ~20 seconds), then once complete, allowing the live query to run which 
> utilize the persistent subquery cache (~100ms).
> However, we noticed that when phoenix sends the cache to the regionservers, 
> it looks at {{some_id}} in the outer query to figure out which regionservers 
> might contain {{table1.id_1 = [some_id]}} ([code 
> here|https://github.com/apache/phoenix/blob/2084a6c/phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java#L282-L283]).
>  This means that when we first start running the query, we'll inconsistently 
> hit the cache until it ends up being propagated to all the regionservers.
> Basically, we'd like to have some way to warm the subquery cache and ensure 
> it's on all the regionservers so subsequent queries will always find the 
> cache. I think the simplest solution might be updating the [if statement in 
> ServerCacheClient#addServerCache|https://github.com/apache/phoenix/blob/2084a6c/phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java#L282-L283]
>  to simply always send the cache to all the regionservers if it's a 
> persistent subquery:
> {code:java}
> - if ( ! servers.contains(entry) &&
> -         keyRanges.intersectRegion(regionStartKey, regionEndKey,
> -                 cacheUsingTable.getIndexType() == IndexType.LOCAL)) {
> + boolean keyRangesIntersect = keyRanges.intersectRegion(regionStartKey, 
> regionEndKey,
> +         cacheUsingTable.getIndexType() == IndexType.LOCAL);
> + if (!servers.contains(entry) && (keyRangesIntersect || usePersistentCache)) 
> {
> {code}
> I tested this out, and it seems to work as expected. If it sounds like an 
> acceptable solution, I'd be happy to make an actual PR. Or, if anyone has any 
> other suggestions on better ways to handle this, it would be much appreciated.
> FYI [~jamestaylor], [~elserj], and [~maryannxue] since it looks like you 
> three handled most of the review on the [original persistent cache 
> PR|https://github.com/apache/phoenix/pull/298]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-5239) Send persistent subquery cache to all regionservers

Reply via email to