[jira] [Commented] (PHOENIX-4666) Add a subquery cache that persists beyond the life of a query

ASF GitHub Bot (JIRA) Fri, 20 Apr 2018 11:39:14 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446181#comment-16446181
 ]


ASF GitHub Bot commented on PHOENIX-4666:
-----------------------------------------

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/298#discussion_r183127442
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java ---
    @@ -216,22 +234,146 @@ public void close() throws SQLException {
                     }
                 }
             }
    -        
    +    }
    +    
    +    public ServerCache checkServerCache(final byte[] cacheId, ScanRanges 
keyRanges, final TableRef cacheUsingTableRef,
    --- End diff --
    
    I am thinking here, can we do this differently?
    Instead of making RPC calls of "checkServerCache" for the first and every 
subsequent queries, we do NOT make any calls, neither "check" or "add" when the 
persistent-cache hint is available and catches the 
{{PersistentCacheNotFoundException}} on the first attempt (or later attempts if 
somehow the cache has been evicted) and then try adding the cache all over 
again. I think it will be more efficient in general.


> Add a subquery cache that persists beyond the life of a query
> -------------------------------------------------------------
>
>                 Key: PHOENIX-4666
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4666
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Marcell Ortutay
>            Assignee: Marcell Ortutay
>            Priority: Major
>
> The user list thread for additional context is here: 
> [https://lists.apache.org/thread.html/e62a6f5d79bdf7cd238ea79aed8886816d21224d12b0f1fe9b6bb075@%3Cuser.phoenix.apache.org%3E]
> ----
> A Phoenix query may contain expensive subqueries, and moreover those 
> expensive subqueries may be used across multiple different queries. While 
> whole result caching is possible at the application level, it is not possible 
> to cache subresults in the application. This can cause bad performance for 
> queries in which the subquery is the most expensive part of the query, and 
> the application is powerless to do anything at the query level. It would be 
> good if Phoenix provided a way to cache subquery results, as it would provide 
> a significant performance gain.
> An illustrative example:
>     SELECT * FROM table1 JOIN (SELECT id_1 FROM large_table WHERE x = 10) 
> expensive_result ON table1.id_1 = expensive_result.id_2 AND table1.id_1 = 
> \{id}
> In this case, the subquery "expensive_result" is expensive to compute, but it 
> doesn't change between queries. The rest of the query does because of the 
> \{id} parameter. This means the application can't cache it, but it would be 
> good if there was a way to cache expensive_result.
> Note that there is currently a coprocessor based "server cache", but the data 
> in this "cache" is not persisted across queries. It is deleted after a TTL 
> expires (30sec by default), or when the query completes.
> This is issue is fairly high priority for us at 23andMe and we'd be happy to 
> provide a patch with some guidance from Phoenix maintainers. We are currently 
> putting together a design document for a solution, and we'll post it to this 
> Jira ticket for review in a few days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4666) Add a subquery cache that persists beyond the life of a query

Reply via email to