liushiqi1001 commented on PR #3092:
URL: https://github.com/apache/dubbo-go/pull/3092#issuecomment-3639933783

   ## 📋 Description
   
   This PR fixes a critical panic that occurs when Go services retrieve 
metadata from Java Dubbo providers running version 3.2.4 or other versions that 
return different metadata types.
   
   ### Problem
   
   When Go consumers try to fetch metadata from certain Java Dubbo providers, 
the service crashes with:
   
   ```
   panic: reflect.Set: value of type string is not assignable to type 
info.MetadataInfo
   ```
   
   **Root Cause**: The panic occurs inside Hessian2 deserializer when Java 
Dubbo returns a `string` type instead of `MetadataInfo` object.
   
   #### Why Java Dubbo Returns String Type?
   
   Java Dubbo MetadataService behavior differs between startup and normal 
operation:
   
   1. **During Java service startup**
      - MetadataService starts before metadata is fully prepared
      - Returns empty string: `""`
      - This is a **transient state** (typically lasts 1-2 seconds)
      - Root cause: Nacos pushes instance immediately after registration, but 
metadata preparation is asynchronous
      - Applies to **all Java Dubbo versions**
   
   2. **Normal operation**
      - MetadataService returns `MetadataInfo` object via Hessian2 serialization
      - Directly deserializes to Go struct
      - Works reliably after startup completes
   
   **The Problem with Old Code**:
   ```go
   // Old code passed strongly-typed struct as reply parameter
   metadataInfo := &info.MetadataInfo{}
   inv, _ := generateInvocation(..., metadataInfo, ...)
   res := m.invoker.Invoke(...)  // ← Panic happens HERE inside Invoke()
   ```
   
   When Java returns `string`, Hessian2 attempts:
   ```go
   reflect.Set(metadataInfo, stringValue)  // ❌ Panic!
   // Error: "value of type string is not assignable to type info.MetadataInfo"
   ```
   
   This panic occurs **during RPC call execution**, before we can intercept it 
with type assertion.
   
   ## 🔧 Solution
   
   ### Key Changes
   
   **1. Use `interface{}` as reply parameter** (`metadata/client.go`)
   
   Instead of passing a strongly-typed struct, we now use `&interface{}` which 
allows Hessian2 to accept any type without panic:
   
   ```go
   // Before
   metadataInfo := &info.MetadataInfo{}
   inv, _ := generateInvocation(..., metadataInfo, ...)  // ❌ Panics on type 
mismatch
   
   // After
   var rawResult interface{}
   inv, _ := generateInvocation(..., &rawResult, ...)    // ✅ Accepts any type
   ```
   
   **Why this works**: Hessian2's `reflectResponse()` function 
(codec.go:474-477) has special handling for `interface{}` types - it skips type 
validation and directly assigns the value.
   
   **2. Safe type assertion with fallback**
   
   After receiving the result, we safely handle both types:
   
   ```go
   if result, ok := rawResult.(*info.MetadataInfo); ok {
       // Modern Dubbo - MetadataInfo object
       metadataInfo = result
   } else if strValue, ok := rawResult.(string); ok {
       // Old Dubbo - JSON string
       metadataInfo = &info.MetadataInfo{}
       json.Unmarshal([]byte(strValue), metadataInfo)
   }
   ```
   
   **3. Graceful degradation** (`service_instances_changed_listener_impl.go`)
   
   Changed error handling from `return err` to `continue`, allowing the service 
to skip problematic instances and try others:
   
   ```go
   if err != nil {
       logger.Warnf("Failed to get metadata from instance %s, skipping", 
instance.GetHost())
       continue  // Skip and try next instance
   }
   ```
   
   ## ✅ Testing
   
   ### Production Verification
   
   Tested with Java Dubbo providers in production environment, demonstrating 
the complete lifecycle from startup failure to automatic recovery.
   
   **Test Case 1: First push during Java service startup (metadata not ready)**
   ```
   2025-12-11 02:46:57  WARN  [MetadataRPC] Provider 172.30.26.245:20880 
returned string type
   2025-12-11 02:46:57  ERROR [MetadataRPC] Failed to parse JSON: unexpected 
end of JSON input
   2025-12-11 02:46:57  ERROR [MetadataRPC]   - String content: (empty)
   2025-12-11 02:46:57  WARN  Failed to get metadata from instance 
172.30.26.245, skipping
   ```
   
   **Result:**
   - ✅ No panic (old code would crash here)
   - ✅ Gracefully skipped this provider
   - ✅ Service remains running
   
   **Test Case 2: Second push after metadata ready (38 seconds later)**
   ```
   2025-12-11 02:47:35  INFO  Received instance notification event of service 
bo-shop-query-dubbo, instance list size 1
   2025-12-11 02:47:35  INFO  [Registry Directory] selector add service 
url{tri://172.30.26.245:20880/com.resto.bff.bo.shop.api.rpc.BoShopRpcServiceI?...methods=pageStoreForShopAppPage,getShopInfo,...}
   2025-12-11 02:47:35  INFO  [TRIPLE Protocol] Refer service: 
tri://172.30.26.245:20880/com.resto.bff.bo.shop.api.rpc.BoShopRpcServiceI
   ```
   
   **Result:**
   - ✅ Metadata successfully retrieved (MetadataInfo object)
   - ✅ Provider `172.30.26.245:20880` successfully added to service directory
   - ✅ Service URL contains complete method list (`pageStoreForShopAppPage`, 
`getShopInfo`, etc.)
   - ✅ Triple protocol invoker created and ready for RPC calls
   - ✅ Service fully operational
   
   **Key Evidence:**
   - Same instance `172.30.26.245:20880` failed at 02:46:57, succeeded at 
02:47:35
   - Service URL shows complete interface methods, proving metadata was parsed 
successfully
   - Automatic recovery within ~38 seconds (typical Nacos push interval: 30s)
   
   ## 📊 Impact
   
   ### Before
   - ❌ Panic crashes entire Go service
   - ❌ No compatibility with Java Dubbo 3.2.4
   - ❌ Service unavailable until manual restart
   
   ### After
   - ✅ No panic - graceful error handling
   - ✅ Compatible with all Java Dubbo versions
   - ✅ Automatic recovery (typically 30-60 seconds)
   - ✅ Clear diagnostic logs
   - ✅ Service remains available
   
   ## 🔍 Related
   
   - Fixes panic when Java Dubbo returns string instead of MetadataInfo
   - Improves compatibility across Java Dubbo versions
   - Adds resilience during Java service startup/restart
   
   ## 📝 Checklist
   
   - [x] Code compiles successfully
   - [x] Tested in production with Java Dubbo 3.2.4
   - [x] Verified automatic recovery mechanism
   - [x] No performance degradation (minimal overhead)
   - [x] Clear error logging for debugging
   
   ---
   
   **Verification**: Successfully running in production with multiple Java 
Dubbo services (bo-shop-query-dubbo, ordering-config-manager-dubbo, 
member-system-dubbo)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to