poorbarcode commented on code in PR #24457:
URL: https://github.com/apache/pulsar/pull/24457#discussion_r2166714549


##########
pulsar-common/src/main/java/org/apache/pulsar/common/naming/TopicName.java:
##########
@@ -152,18 +139,18 @@ private TopicName(String completeTopicName) {
             parts = Splitter.on("/").limit(4).splitToList(rest);
             if (parts.size() == 3) {
                 // New topic name without cluster name
-                this.tenant = parts.get(0);
+                this.tenant = StringInterner.intern(parts.get(0));
                 this.cluster = null;
-                this.namespacePortion = parts.get(1);
-                this.localName = parts.get(2);
+                this.namespacePortion = StringInterner.intern(parts.get(1));
+                this.localName = StringInterner.intern(parts.get(2));
                 this.partitionIndex = getPartitionIndex(completeTopicName);
                 this.namespaceName = NamespaceName.get(tenant, 
namespacePortion);
             } else if (parts.size() == 4) {
                 // Legacy topic name that includes cluster name
-                this.tenant = parts.get(0);
-                this.cluster = parts.get(1);
-                this.namespacePortion = parts.get(2);
-                this.localName = parts.get(3);
+                this.tenant = StringInterner.intern(parts.get(0));
+                this.cluster = StringInterner.intern(parts.get(1));
+                this.namespacePortion = StringInterner.intern(parts.get(2));
+                this.localName = StringInterner.intern(parts.get(3));

Review Comment:
   > The duplicate tenant and namespace parts add up. Within a cluster, lookups 
could happen on any broker. The topic pattern listing use case will load up 
more topic names into a single broker than it has assigned to it. For 
partitioned topics this is multiplied by the number of partitions. There's a 
need to optimize memory usage of topic pattern listing use cases and String 
instance deduplication is a key solution in that. We can come back to that 
later.
   
   Let's review the lookup process, and you will understand where the problem 
lies
   1. Received a `lookup` request
   2. If the Topic Name in the cache
     a. Broker gets the cached TopicName 
     b. Release the object `lookup`
   3. If the Topic Name is not in the cache 
     a. after a split, Broker gets a string array `[tenant, namespace, topic]`
     b. The Broker calls `intern(tenant), intern(namespace), intern(topic)` 
here, and get a new TopicName with shared `tenant & namespace`
   4. After the broker responds to the client
     a. JVM will release the string array
     b. JVM will release the TopicName object, since it is not useful anymore.
   
   Whether there is a step 3-b or not, the objects that the JVM needs to 
release (ultimately) are the same:
   - a string array `[tenant, namespace, topic]`
   - a TopicName.
   
   But the step 3-b costs CPU resource.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to