rdhabalia opened a new issue, #20265:
URL: https://github.com/apache/pulsar/issues/20265

   
   # Motivation
   
   Right now, Pulsar provides the topic's stats and stats-internal over HTTP 
admin API, and this stats data is used by user applications and also by Pulsar 
internal components such as Pulsar-functions to derive the certain states of 
the applications.
   for example, there are use cases where the application wants to check the 
topic's backlog, subscription's state (readPosition, list of subscriptions), 
numberOfEntriesSinceFirstNotAckedMessage, etc to bootstrap the application or 
handle the application’s resiliency and state dynamically. Applications can 
retrieve this stats information by using the broker’s admin HTTP APIs. 
   
   However, stats retrieval over HTTP API doesn’t work well in use cases when 
users would like to access this API at a higher scale when a large number of 
application nodes would like to use it over HTTP which could overload brokers 
and sometimes makes broker irresponsive and impact admin API performance. It’s 
also become difficult when Pulsar is deployed in the cloud behind the SNI proxy 
and applications also want to access large-scale stats information periodically 
over different HTTP port instead it would be better if applications can fetch 
stats over on same binary protocol for scalability and accessibility reasons. 
   
   Therefore, there are multiple use cases where producer/consumer applications 
need stats information for topics using the client library over binary 
protocol. Hence, this PIP introduces client API for producers and consumers to 
access topic stats/internal-stats information which can be used by applications 
as needed.
   
   
   # Client library changes
   
   The client library will allow both producer and consumer to retrieve stats 
using `TopicStatsProvider` that fetches stats/internal-stats for both 
persistent and non-persistent topics. Producer/consumer of a partition can use 
the same connection with the broker to access topic and retrieve topics stats. 
Pulsar also allows application to access multiple different partitions/topics 
using single  producer/consumer so, that producer/consumer should be able to 
provide generic API to access stats of single or multi topics/partitions served 
by them.
   
   Therefore, producer/consumer will have additional API that returns 
`TopicStatsProvider` which allows the application to fetch stats/internal-stats 
of all topics/partitions served by that producer/consumer.
   
   ```
   Producer.java / Consumer.java
   /**
   * Returns {@link TopicStatsProvider} to retrieve stats/internal-stats of the 
topic.
   */
   TopicStatsProvider getTopicStatsProvider();
   ```
   ```
   TopicStatsProvider.java
   /**
   * TopicStatsProvider provides API to access topic’s stats and internal-stat 
information.
   */
   public interface TopicStatsProvider {
   
       /**
        * @return the topic stats
        */
       CompletableFuture<TopicStatsInfo>  getStats();
   
       /**
        * @return the internal topic stats
        */
       CompletableFuture<TopicInternalStatsInfo>  getInternalStats();
   }
   ```
   
   ```
   TopicStatsInfo.java
   
   /**
   * TopicStatsInfo contains the stats information of all partitions of a given 
topic. It allows the client to access stats of each individual partition from 
the partition stats map.
   *
   public class TopicStatsInfo {
       private Map<String, TopicStats> partitions;
   }
   
   TopicInternalStatsInfo.java
   
   /**
   * TopicInternalStatsInfo contains internal-stats information of topic. It 
allows the client to access stats of partitioned and non-partitioned topics.
   */
   public class TopicInternalStatsInfo {
       private Map<String, PersistentTopicInternalStats> partitions;
   }
   ```
   
   We would like to create create a generic stats protocol between client and 
broker so, broker can send any type of stats (eg: stats or internal-stats) by 
serializing into specific format and client can deserialize it into appropriate 
format  and return back to the client application.
   Therefore, this PIP will introduce new wire-protocol for `TopicStats` where 
client sends stats-type and topic-partition to the broker and broker sends back 
serialized stats response which can be deserialized by client based on stats 
type format.
   
   ```
   message CommandTopicStats {
       enum StatsType {
           STATS = 0;
           STATS_INTERNAL = 1;
       }
       required uint64 request_id    = 1;
       required string topic_name    = 2;
       required StatsType stats_type = 3;
   }
   
   message CommandTopicStatsResponse {
       required uint64 request_id    = 1;
       optional ServerError error_code    = 2;
       optional string error_message = 3;
       optional string stats_json    = 4;
   }
   ```
   
   # Broker changes
   
   A broker will have support to handle the stats command for a partition and 
return stats/internal-stats response for a partition. Broker will authorize 
connected client if client is authorized to access topic stats and returns the 
stats once client is successfully authorized.
   
   This PIP will also restrict clients and brokers to handle a number of 
concurrent requests to protect brokers against a large number of such stats 
requests.
   
   [Sample Prototype with new API 
support](https://github.com/apache/pulsar/compare/master...rdhabalia:stats_proto?expand=1)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to