eric-maynard commented on code in PR #490: URL: https://github.com/apache/polaris/pull/490#discussion_r2000190401
########## polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityWeigher.java: ########## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.core.persistence.cache; + +import com.github.benmanes.caffeine.cache.Weigher; +import org.checkerframework.checker.index.qual.NonNegative; + +/** + * A {@link Weigher} implementation that weighs {@link EntityCacheEntry} objects by the approximate + * size of the entity object. + */ +public class EntityWeigher implements Weigher<Long, EntityCacheEntry> { + + /** The amount of weight that is expected to roughly equate to 1MB of memory usage */ + public static final long WEIGHT_PER_MB = 1024 * 1024; + + /* Represents the approximate size of an entity beyond the properties */ + private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000; + + /** Singleton instance */ + private static final EntityWeigher instance = new EntityWeigher(); + + private EntityWeigher() {} + + /** Gets the singleton {@link EntityWeigher} */ + public static EntityWeigher getInstance() { + return instance; + } + + /** + * Computes the weight of a given entity + * + * @param key The entity's key; not used + * @param value The entity to be cached + * @return The weight of the entity + */ + @Override + public @NonNegative int weigh(Long key, EntityCacheEntry value) { + return APPROXIMATE_ENTITY_OVERHEAD + + value.getEntity().getProperties().length() + + value.getEntity().getInternalProperties().length(); Review Comment: > Ideally, we should plot the weight computed by code and actual heap usage on the same chart and see how they align. They do not have to match perfectly, but the weight should be underestimate heap usage (after GC, of course). This is more or less what's done above -- we plot the heap usage against the number of characters written in to the cache. We want to determine the relationship between bytes written and heap usage, and the weight is only an implementation detail of how we get there. > Before GC runs any heap measurements we obtain from the JRE are tainted by trash objects. Yes and no. We may care about the memory usage both pre-GC and post-GC; both are meaningful to a user who's running an application and seeing the impact of the cache on memory pressure. I agree that attempting a GC before measurement makes our measurement something closer to "cache size". I pushed the code to my fork [here](https://github.com/eric-maynard/polaris/pull/new/cache-benchmark); please feel free to check out the code and perform your own tests if you'd like. I have refreshed the data in the spreadsheet linked above based on that code. From what I can see, a multiplier of 3 lands us in a good enough place (~60 MB post-GC and ~150+ MB pre-GC) to move forward. If you observe differently in your testing or have any other concerns please do share them below. I would re-iterate that we shouldn't even care exactly how big the cache is in bytes, just that the value is bounded and fairly close to the previous target of 100 MB. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
