nanorth opened a new issue, #2366:
URL: https://github.com/apache/incubator-pegasus/issues/2366

   ## Feature Request
   
   ### **Is your feature request related to a problem? Please describe:**
   
   Currently, there is no built-in version management mechanism for clients, 
leading to the following issues:
   
   - Manual version verification is required before users use the client.
   - Outdated client versions are still active in production, carrying 
potential bugs that affect system stability.
   - Lack of visibility into client version distribution, making it difficult 
to drive version convergence.
   - Inability to quickly locate IPs using a specific client version.
   
   ### **Describe the feature you'd like:**
   
   I propose enhancing the client `query_cfg` mechanism to automatically report 
client version information during the query process. This data should be 
collected and visualized by MetaProxy.
   
   Specifically:
   
   - Extend the `query_cfg_request` Thrift struct to include optional fields 
like `client_version`, `client_ip`, and `client_sdk`.
   - MetaProxy should collect and monitor this data upon receiving the request.
   - Use Prometheus + Grafana for aggregation and visualization, enabling 
filtering by cluster, region, table, client type, and version.
   
   ### **Describe alternatives you've considered:**
   
   - We have considered alternative approaches such as:
     - **Active heartbeat mechanism:** This would require introducing new RPC 
interfaces or communication paths, as well as additional client-side logic. It 
may also necessitate deploying new backend components for data collection, 
increasing system complexity and maintenance cost.
     - **Log-based collection:** While feasible, MetaProxy typically runs in 
Kubernetes Pods. To persist and collect logs would require additional log 
collection and storage infrastructure, which introduces unnecessary 
dependencies and operational overhead.
   
   We chose to leverage existing `query_config` requests to implement version 
reporting, which is more lightweight and backward-compatible.
   
   ### **Teachability, Documentation, Adoption, Migration Strategy:**
   
   #### Interface Changes:
   
   We may extend the existing Thrift struct `query_cfg_request` with optional 
fields. This guarantees backward compatibility.
   
   ```thrift
   struct query_cfg_request {
       1: string app_name;
       2: list<i32> partition_indices;
       3: option string client_version;
       4: option string client_ip;
       5: option string client_port;
       6: option string client_sdk;
   }
   ```
   
   #### Client Version Maintenance and Retrieval
   
   | Language | Version Maintenance Mechanism                      | Version 
Retrieval Method                        |
   | -------- | -------------------------------------------------- | 
----------------------------------------------- |
   | Java     | Defined in `<version>` tag in `pom.xml`            | Injected 
into code via Maven resource filtering |
   | C++      | Defined via macro `PEGASUS_VERSION` in `version.h` | Read 
directly from file                         |
   | Go       | Version field in `go.mod` + Git commit hash        | Parsed 
using `runtime/debug.ReadBuildInfo()`    |
   | Python   | `__version__` variable in `__init__.py`            | Retrieved 
using `importlib.metadata.version()`  |
   | Scala    | `version` configured in `.sbt`                     | Generated 
into `BuildInfo.scala` during build   |
   
   #### Metrics Tag Format
   
   For metrics collection and monitoring, MetaProxy will log the following 
fields upon receiving a topology query request:
   
   ```
   {
     "client_version": "2.3.8",
     "client_ip": "192.168.1.100",
     "client_port": "1234",
     "client_sdk": "pegasus-java-client",
     "timestamp": 1769754743,
     "table_name": "test",
     "cluster_name": "aktst-function1",
     "region": "c3"
   }
   ```
   
   #### Dashboard Features:
   
   We envision the following visualizations and features in the Grafana 
dashboard backed by Prometheus metrics:
   
   - **Overview:**
     - Unique client IP count over time.
     - Client type distribution (Java/C++/Go).
     - Client version distribution.
   - **Detail Query:**
     - Filter by region, cluster, table, or client type.
     - Time-based aggregation.
     - Export IP list of clients using a specific version.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to