eric-maynard commented on code in PR #490:
URL: https://github.com/apache/polaris/pull/490#discussion_r2000190401


##########
polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityWeigher.java:
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.polaris.core.persistence.cache;
+
+import com.github.benmanes.caffeine.cache.Weigher;
+import org.checkerframework.checker.index.qual.NonNegative;
+
+/**
+ * A {@link Weigher} implementation that weighs {@link EntityCacheEntry} 
objects by the approximate
+ * size of the entity object.
+ */
+public class EntityWeigher implements Weigher<Long, EntityCacheEntry> {
+
+  /** The amount of weight that is expected to roughly equate to 1MB of memory 
usage */
+  public static final long WEIGHT_PER_MB = 1024 * 1024;
+
+  /* Represents the approximate size of an entity beyond the properties */
+  private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000;
+
+  /** Singleton instance */
+  private static final EntityWeigher instance = new EntityWeigher();
+
+  private EntityWeigher() {}
+
+  /** Gets the singleton {@link EntityWeigher} */
+  public static EntityWeigher getInstance() {
+    return instance;
+  }
+
+  /**
+   * Computes the weight of a given entity
+   *
+   * @param key The entity's key; not used
+   * @param value The entity to be cached
+   * @return The weight of the entity
+   */
+  @Override
+  public @NonNegative int weigh(Long key, EntityCacheEntry value) {
+    return APPROXIMATE_ENTITY_OVERHEAD
+        + value.getEntity().getProperties().length()
+        + value.getEntity().getInternalProperties().length();

Review Comment:
   > Ideally, we should plot the weight computed by code and actual heap usage 
on the same chart and see how they align. They do not have to match perfectly, 
but the weight should be underestimate heap usage (after GC, of course).
   
   This is more or less what's done above -- we plot the heap usage against the 
number of characters written in to the cache. We want to determine the 
relationship between bytes written and heap usage, and the weight is only an 
implementation detail of how we get there.
   
   > Before GC runs any heap measurements we obtain from the JRE are tainted by 
trash objects.
   
   Yes and no. We may care about the memory usage both pre-GC and post-GC; both 
are meaningful to a user who's running an application and seeing the impact of 
the cache on memory pressure. I agree that attempting a GC before measurement 
makes our measurement something closer to "cache size".
   
   I pushed the code to my fork 
[here](https://github.com/eric-maynard/polaris/pull/new/cache-benchmark); 
please feel free to check out the code and perform your own tests if you'd 
like. I have refreshed the data in the spreadsheet linked above based on that 
code.
   
   From what I can see, a multiplier of 3 lands us in a good enough place (~60 
MB post-GC and ~150+ MB pre-GC) to move forward. If you observe differently in 
your testing or have any other concerns please do share them below. I would 
re-iterate that we shouldn't even care exactly how big the cache is in bytes, 
just that the value is bounded and fairly close to the previous target of 100 
MB.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to