lhotari commented on PR #25822:
URL: https://github.com/apache/pulsar/pull/25822#issuecomment-4500269281

   These are findings from a local Claude Code review. Please check before 
merging — they are suggestions, not blockers, and some may be intentional 
design choices.
   
   ### Findings
   
   1. **[BUG] `toScalableTopic()` doesn't strip the `-partition-K` suffix** — 
`pulsar-common/.../TopicName.java`
      `getEncodedLocalName()` returns the full local name *including* a 
partition suffix. A lookup against `persistent://t/n/x-partition-3` resolves to 
`topic://t/n/x-partition-3` rather than the canonical `topic://t/n/x`. The PR 
description and Javadoc promise "the canonical `topic://t/n/x` identity", so 
either the contract is wrong or the implementation is. Suggested fix: use 
`TopicName.get(getPartitionedTopicName()).getLocalName()` (or short-circuit on 
`isPartitioned()`) before building the new name.
   
   2. **[BUG] Partitioned-input synthetic layout chains partition suffixes** — 
`DagWatchSession#buildSyntheticResponse`
      `topicName.getPartition(k)` only short-circuits when the existing suffix 
matches `-partition-K` for the same index `K`. If a client supplies 
`persistent://t/n/x-partition-0`, `getPartition(1)` produces 
`persistent://t/n/x-partition-0-partition-1` (a non-existent topic). Combined 
with #1, partition-specific inputs are accepted but produce nonsense 
underlying-topic names. Either reject inputs where `topicName.isPartitioned()` 
early, or normalize to the base partitioned-topic name before calling 
`getPartition(k)`.
   
   3. **[BUG] `HashRange` validation throws when `partitions > 65536`** — 
`DagWatchSession#buildSyntheticResponse`
      `int width = 0x10000 / partitions;` becomes `0` once partitions exceed 
the 16-bit hash space, and the next line computes `end = start + width - 1 = 
-1`, which fails `HashRange`'s `start <= end` invariant and surfaces as an 
`IllegalArgumentException` rather than a clean lookup error. Practical exposure 
is low (broker `maxNumPartitionsPerPartitionedTopic` is typically smaller), but 
the failure mode is ugly. Either cap partitions or assign every range as a 
degenerate single-hash slot (which works since "routing is mod-N over 
segment_id" anyway, per the existing comment).
   
   4. **[QUALITY] Synthetic layout is built for topics that don't exist** — 
`DagWatchSession#buildSyntheticResponse`
      `fetchPartitionedTopicMetadataAsync` returns 
`PartitionedTopicMetadata(0)` for both non-partitioned topics *and* topics that 
don't exist. The pre-PR behavior was to fail with 
`IllegalStateException(\"Scalable topic not found: ...\")`; the new behavior is 
to return a synthetic single-segment layout pointing at a possibly non-existent 
`persistent://...` topic. This may be intentional (auto-create on connect 
downstream), but it's a real behavior change. A quick 
`checkTopicExists`/`namespaceExists` guard for the non-partitioned case would 
tighten this; at minimum the Javadoc should call it out.
   
   5. **[QUALITY] `SegmentInfo.isSpecial()` accepts empty string**
      `underlyingTopicName \!= null` returns `true` for the empty string. 
Factory methods guard against this, but the record's canonical constructor is 
public; one stray `\"\"` would silently break the special/regular distinction. 
Cheap to harden: `return underlyingTopicName \!= null && 
\!underlyingTopicName.isEmpty();` and/or reject blanks in the canonical 
constructor.
   
   6. **[QUALITY] `Commands.newScalableTopicUpdate` doesn't handle a null 
`resolvedTopicName`**
      Lightproto's `setResolvedTopicName(null)` will NPE rather than leave the 
field unset. Today all callers pass a non-null value, but the signature accepts 
`String` without `@Nonnull` and the proto field is `optional`. Either annotate 
the param or guard the setter (`if (resolvedTopicName \!= null) ...`).
   
   7. **[QUALITY] Cosmetic hash ranges in synthetic partitioned layout**
      The comment correctly notes ranges are "cosmetic — routing for synthetic 
layouts is mod-N over segment_id". That contract lives entirely in this method 
plus a future SDK PR. Worth a `// see PIP-475 §X` link or shared constant so 
broker- and client-side conventions don't drift. Specifically: if the SDK ever 
consults `hash_start`/`hash_end` for synthetic layouts, partition counts that 
don't divide 0x10000 evenly will misroute by one bucket per cycle.
   
   8. **[QUALITY] V1 namespace handling in `toScalableTopic()` not asserted**
      For a V1-style name like `persistent://tenant/cluster/ns/x`, 
`getNamespace()` returns the 3-segment form `tenant/cluster/ns`, producing 
`topic://tenant/cluster/ns/x`. `TopicName.get()` should round-trip that, but 
there are no tests exercising it. A one-line unit test (or an explicit guard 
against V1) would prevent surprise.
   
   9. **[QUALITY] `pushUpdate` race with watch-fired updates (pre-existing, but 
widened here)**
      The watch is registered *before* the initial metadata fetch, so an 
`onNotification` for the canonical path can deliver a `pushUpdate(...)` to the 
client *before* `ServerCnx` finishes wiring up 
`start().thenAcceptAsync(session::pushUpdate, ...)`. Not introduced by this PR, 
but the new synthetic-layout path widens the window because the initial 
response now does an extra `fetchPartitionedTopicMetadataAsync` round-trip. 
Worth tracking even if deferred.
   
   10. **[INTENT MISMATCH — minor] Short-form input not exercised by tests**
       The PR description claims short-form topic names are accepted. That 
works because `TopicName.get()` already normalizes short forms to 
`persistent://public/default/<name>`, but the PR itself adds no test for it. 
Consider one unit test asserting that a short-form input yields the expected 
`resolvedTopicName`.
   
   ### Non-issues / nits
   - Proto field number `11` for `underlying_topic_name` is the next free 
number — no conflict.
   - `LinkedHashMap` for `segments` preserves insertion order so partition-K → 
segment-K mapping is stable on the wire.
   - The watch-replaces-synthetic mechanism is correct: `onMetadataChanged` 
routes through `buildResponse(metadata)` (the real path), so a later migration 
transparently swaps the synthetic layout.
   - `ServerCnx` non-persistent rejection happens before `isRunning`/resources 
checks — fine, the early reject leaks no information that isn't already public.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to