poorbarcode commented on code in PR #24457: URL: https://github.com/apache/pulsar/pull/24457#discussion_r2166714549
########## pulsar-common/src/main/java/org/apache/pulsar/common/naming/TopicName.java: ########## @@ -152,18 +139,18 @@ private TopicName(String completeTopicName) { parts = Splitter.on("/").limit(4).splitToList(rest); if (parts.size() == 3) { // New topic name without cluster name - this.tenant = parts.get(0); + this.tenant = StringInterner.intern(parts.get(0)); this.cluster = null; - this.namespacePortion = parts.get(1); - this.localName = parts.get(2); + this.namespacePortion = StringInterner.intern(parts.get(1)); + this.localName = StringInterner.intern(parts.get(2)); this.partitionIndex = getPartitionIndex(completeTopicName); this.namespaceName = NamespaceName.get(tenant, namespacePortion); } else if (parts.size() == 4) { // Legacy topic name that includes cluster name - this.tenant = parts.get(0); - this.cluster = parts.get(1); - this.namespacePortion = parts.get(2); - this.localName = parts.get(3); + this.tenant = StringInterner.intern(parts.get(0)); + this.cluster = StringInterner.intern(parts.get(1)); + this.namespacePortion = StringInterner.intern(parts.get(2)); + this.localName = StringInterner.intern(parts.get(3)); Review Comment: > The duplicate tenant and namespace parts add up. Within a cluster, lookups could happen on any broker. The topic pattern listing use case will load up more topic names into a single broker than it has assigned to it. For partitioned topics this is multiplied by the number of partitions. There's a need to optimize memory usage of topic pattern listing use cases and String instance deduplication is a key solution in that. We can come back to that later. Let's review the lookup process, and you will understand where the problem lies 1. Received a `lookup` request 2. If the Topic Name in the cache a. Broker gets the cached TopicName b. Release the object `lookup` 3. If the Topic Name is not in the cache a. after a split, Broker gets a string array `[tenant, namespace, topic]` b. The Broker calls `intern(tenant), intern(namespace), intern(topic)` here, and get a new TopicName with shared `tenant & namespace` 4. After the broker responds to the client a. JVM will release the string array b. JVM will release the TopicName object, since it is not useful anymore. Whether there is a step 3-b or not, the objects that the JVM needs to release (ultimately) are the same: - a string array `[tenant, namespace, topic]` - a TopicName. But the step 3-b costs CPU resource. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org