[jira] [Commented] (PHOENIX-6761) Phoenix Client Side Metadata Caching Improvement

ASF GitHub Bot (Jira) Thu, 29 Sep 2022 10:32:05 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611153#comment-17611153
 ]


ASF GitHub Bot commented on PHOENIX-6761:
-----------------------------------------

gjacoby126 commented on code in PR #1506:
URL: https://github.com/apache/phoenix/pull/1506#discussion_r983709672


##########
phoenix-core/src/it/java/org/apache/phoenix/query/MetaDataCachingIT.java:
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.query;
+
+import org.apache.phoenix.end2end.NeedsOwnMiniClusterTest;
+import org.apache.phoenix.jdbc.PhoenixConnection;
+import org.apache.phoenix.jdbc.PhoenixDatabaseMetaData;
+import org.apache.phoenix.schema.*;
+import org.apache.phoenix.thirdparty.com.google.common.collect.Maps;
+import org.apache.phoenix.util.ReadOnlyProps;
+import org.apache.phoenix.util.RunUntilFailure;
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.ResultSet;
+import java.sql.SQLException;
+import java.util.Map;
+import java.util.Random;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+@RunWith(RunUntilFailure.class)
+@Category(NeedsOwnMiniClusterTest.class)
+public class MetaDataCachingIT extends BaseTest {
+
+    private static final Logger LOGGER = 
LoggerFactory.getLogger(MetaDataCachingIT.class);
+    private final Random RAND = new Random(11);
+
+    @BeforeClass
+    public static synchronized void doSetup() throws Exception {
+        Map<String, String> props = Maps.newHashMapWithExpectedSize(1);
+        // We set here a tiny cache to verify that even if the total size of 
the cache is just enough to hold
+        // system tables and Phoenix is still functional. Please note the 
cache weight for system tables is set to

Review Comment:
   nice comment, thanks



##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/AppendOnlySchemaIT.java:
##########
@@ -324,7 +326,11 @@ public void testValidateAttributes() throws Exception {
             assertEquals(1000, view.getUpdateCacheFrequency());
         }
     }
-    
+
+    /*
+    In PHOENIX-6761, connection level cache is removed so the dropped view 
will not be found when trying to upsert to it.

Review Comment:
   If the test isn't valid anymore, should it be removed?





> Phoenix Client Side Metadata Caching Improvement
> ------------------------------------------------
>
>                 Key: PHOENIX-6761
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6761
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Palash Chauhan
>            Priority: Major
>         Attachments: PHOENIX-6761.master.initial.patch
>
>
> CQSI maintains a client-side metadata cache, i.e., schemas, tables, and 
> functions, that evicts the last recently used table entries when the cache 
> size grows beyond the configured size.
> Each time a Phoenix connection is created, the client-side metadata cache 
> maintained by the CQSI object creating this connection is cloned for the 
> connection. Thus, we have two levels of caches, one at the Phoenix connection 
> level and the other at the CQSI level. 
> When a Phoenix client needs to update the client side cache, it updates both 
> caches (on the connection object and on the CQSI object). The Phoenix client 
> attempts to retrieve a table from the connection level cache. If this table 
> is not there then the Phoenix client does not check the CQSI level cache, 
> instead it retrieves the object from the server and finally updates both the 
> connection and CQSI level cache.
> PMetaDataCache provides caching for tables, schemas and functions but it 
> maintains separate caches internally, one cache for each type of metadata. 
> The cache for the tables is actually a cache of PTableRef objects. PTableRef 
> holds a reference to the table object as well as the estimated size of the 
> table object, the create time, last access time, and resolved time. The 
> create time is set to the last access time value provided when the PTableRef 
> object is inserted into the cache. The resolved time is also provided when 
> the PTableRef object is inserted into the cache. Both the created time and 
> resolved time are final fields (i.e., they are not updated). PTableRef 
> provide a setter method to update the last access time. PMetaDataCache 
> updates the last access time whenever the table is retrieved from the cache. 
> The LRU eviction policy is implemented using the last access time. The 
> eviction policy is not implemented for schemas and functions. The 
> configuration parameter for the frequency of updating cache is 
> phoenix.default.update.cache.frequency. This can be defined at the cluster or 
> table level. When it is set to zero, it means cache would not be used.
> Obviously the eviction of the cache is to limit the memory consumed by the 
> cache. The expected behavior is that when a table is removed from the cache, 
> the table (PTableImpl) object is also garbage collected. However, this does 
> not really happen because multiple caches make references to the same object 
> and each cache maintains its own table refs and thus access times. This means 
> that the access time for the same table may differ from one cache to another; 
> and when one cache can evict an object, another cache will hold on the same 
> object. 
> Although individual caches implements the LRU eviction policy, the overall 
> memory eviction policy for the actual table objects is more like age based 
> cache. If a table is frequently accessed from the connection level caches, 
> the last access time maintained by the corresponding table ref objects for 
> this table will be updated. However, these updates on the access times will 
> not be visible to the CQSI level cache. The table refs in the CQSI level 
> cache have the same create time and access time. 
> Since whenever an object is inserted into the local cache of a connection 
> object, it is also inserted the cache on the CSQI object, the CQSI level 
> cache will grow faster than the caches on the connection objects. When the 
> cache reaches its maximum size, the newly inserted tables will result in 
> evicting one of the existing tables in the cache. Since the access time of 
> these tables are not updated on the CQSI level cache, it is likely that the 
> table that has stayed in the cache for the longest period of time will be 
> evicted (regardless of whether the same table is frequently accessed via the 
> connection level caches). This obviously defeats the purpose of an LRU cache.
> Another problem with the current cache is related to the choice of its 
> internal data structures and its eviction implementation. The table refs in 
> the cache are maintained in a hash map which maps a table key (which is pair 
> of a tenant id and table name) to a table ref. When the size of a cache (the 
> total byte size of the table objects referred by the cache) reaches its 
> configured limit, how much overage adding a new table would cause is 
> computed. Then all the table refs in this cache are cloned into a priority 
> queue as well as a new cache. This queue uses the access time to determine 
> the order of its elements (i.e., table refs). The table refs that should not 
> be evicted are removed from the queue, which leaves the table refs to be 
> evicted in the queue. Finally, the table refs left in the queue are removed 
> from the new cache. The new cache replaces the old one. It clear that this is 
> an expensive operation in terms of memory allocations and CPU time. The bad 
> news is that when the cache reaches its limit, every insertion would likely 
> cause an eviction and this expensive operation will be repeated for each such 
> insertion.
> Since Phoenix connections are supposed to be short lived, maintaining a 
> separate cache for each connection object and especially cloning entire cache 
> content (and then pruning the entries belonging to other tenants when the 
> connection is a tenant specific connection) are not justified. The cost of 
> such a clone operation by itself would offset the gain of not accessing the 
> CQSI level cache as the number of such accesses per connection should be 
> small because of short lived Phoenix connections. 
> Also the impact of Phoenix connection leaks, the connections that are not 
> closed by applications and simply long lived connections will be exacerbated 
> since these connections will have references to the large set of table 
> objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-6761) Phoenix Client Side Metadata Caching Improvement

Reply via email to