tschmidt64 opened a new issue #6620: Java Topic URL Parser is Incorrect
URL: https://github.com/apache/pulsar/issues/6620
 
 
   **Describe the bug**
   I think the code that parses the topic name does not parse topic names 
correctly. I will limit my example to the [consumer 
code](https://github.com/apache/pulsar/blob/639fbf1801b227bb191cfc548f26906902481b8f/pulsar-common/src/main/java/org/apache/pulsar/common/naming/TopicName.java#L126)
 but I assume similar issues will apply to readers and producers.
   
   From the code (line numbers are for reference, they do not match the actual 
code):
   
   ```java
      1  // The fully qualified topic name can be in two different forms:
      2  // new:    persistent://tenant/namespace/topic
      3  // legacy: persistent://tenant/cluster/namespace/topic
      4  
      5  List<String> parts = 
Splitter.on("://").limit(2).splitToList(completeTopicName);
      6  this.domain = TopicDomain.getEnum(parts.get(0));
      7  
      8  String rest = parts.get(1);
      9  
     10  // The rest of the name can be in different forms:
     11  // new:    tenant/namespace/<localName>
     12  // legacy: tenant/cluster/namespace/<localName>
     13  // Examples of localName:
     14  // 1. some/name/xyz//
     15  // 2. /xyz-123/feeder-2
     16  
     17  
     18  parts = Splitter.on("/").limit(4).splitToList(rest);
     19  if (parts.size() == 3) {
     20      // New topic name without cluster name
     21      this.tenant = parts.get(0);
     22      this.cluster = null;
     23      this.namespacePortion = parts.get(1);
     24      this.localName = parts.get(2);
     25      this.partitionIndex = getPartitionIndex(completeTopicName);
     26      this.namespaceName = NamespaceName.get(tenant, namespacePortion);
     27  } else if (parts.size() == 4) {
     28      // Legacy topic name that includes cluster name
     29      this.tenant = parts.get(0);
     30      this.cluster = parts.get(1);
     31      this.namespacePortion = parts.get(2);
     32      this.localName = parts.get(3);
     33      this.partitionIndex = getPartitionIndex(completeTopicName);
     34      this.namespaceName = NamespaceName.get(tenant, cluster, 
namespacePortion);
     35  } else {
     36      throw new IllegalArgumentException("Invalid topic name: " + 
completeTopicName);
     37  }
     38  
   ```
   
   
   **To Reproduce**
   As an breaking example, take the non-legacy web socket topic URL:
   
   ```
   persistent://my-tenant/my-ns/topic/with/slashes
   ```
   I'll post the original line and then what that translates to with this 
example:
   ```java
      8  String rest = parts.get(1);
      // String rest = "my-tenant/my-ns/topic/with/slashes";
         ...
     18  parts = Splitter.on("/").limit(4).splitToList(rest);
      // parts = ["my-tenant", "my-ns", "topic", "with/slashes"];
   ```
   
   Good so far. But then the if condition:
   ```java
     19  if (parts.size() == 3) {
     20      // New topic name without cluster name
     21      this.tenant = parts.get(0);
     22      this.cluster = null;
     23      this.namespacePortion = parts.get(1);
     24      this.localName = parts.get(2);
     25      this.partitionIndex = getPartitionIndex(completeTopicName);
     26      this.namespaceName = NamespaceName.get(tenant, namespacePortion);
     27  } else if (parts.size() == 4) {
     28      // Legacy topic name that includes cluster name
     29      this.tenant = parts.get(0);
     30      this.cluster = parts.get(1);
     31      this.namespacePortion = parts.get(2);
     32      this.localName = parts.get(3);
     33      this.partitionIndex = getPartitionIndex(completeTopicName);
     34      this.namespaceName = NamespaceName.get(tenant, cluster, 
namespacePortion);
     35  } else {
     36      throw new IllegalArgumentException("Invalid topic name: " + 
completeTopicName);
     37  }
   ```
   
   In this example, `parts.size()` is equal to `4`, which means it will enter 
the `else if` block. This is the block for parsing the legacy code, as `line 
28` mentions
   
   Then lines `29-32` get run (translations included):
   ```java
      // parts = ["my-tenant", "my-ns", "topic", "with/slashes"];
   
         ...
     29  this.tenant = parts.get(0);
      // this.tenant = "my-tenant";        // Good
     30  this.cluster = parts.get(1);
      // this.cluster = "my-ns";           // **BAD**
     31  this.namespacePortion = parts.get(2);
      // this.namespacePortion = "topic";  // **BAD**
     32  this.localName = parts.get(3);
      // this.localName = "with/slashes";  // **BAD**
   
   ```
   
   **Expected behavior**
   The code should not enter the `else if` condition. It should enter the `if` 
condition. 
   This line assumes that the topic will not have any slashes in it. But this 
assumption is incorrect. The 
[documentation](http://pulsar.apache.org/docs/en/concepts-messaging/#topics) 
explicitly states that "topic names are freeform". The Java 
[code](https://github.com/apache/pulsar/blob/639fbf1801b227bb191cfc548f26906902481b8f/pulsar-common/src/test/java/org/apache/pulsar/common/naming/TopicNameTest.java#L180)
 even tests for this case. 
   
   **Screenshots**
   N/a
   
   **Desktop (please complete the following information):**
   N/a
   
   **Additional context**
   N/a

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to