General mechanism for fetching metrics via RPC sounds like a good
idea. Especially since Kafka has clients in many languages, but
support for JMX is not wide-spread outside the Java ecosystem.
Command-line script seems like a bad fit for use-cases where you want
the clients themselves to use metric information about the broker to
drive their behavior. Or even for something like a Kubernetes
controller.

Gwen

On Mon, Nov 18, 2019 at 4:41 PM Colin McCabe <cmcc...@apache.org> wrote:
>
> Hi,
>
> In general, metrics are a "broker status API" telling you important things 
> about the state of the broker, its performance, etc. etc., right?  What 
> argument is there for creating a separate API for this particular metric?  If 
> you argue that JMX is not convenient, it seems like that would also apply to 
> any other metric we expose.
>
> Perhaps we should just add an easy command-line script for querying the JMX 
> interface?  Or a more general mechanism for fetching metrics via Kafka RPC.
>
> best,
> Colin
>
>
> On Mon, Nov 18, 2019, at 10:54, Jason Gustafson wrote:
> > Hi Noa,
> >
> > Re; uptime. Sure, it was just a suggestion. However, we should be clear
> > that there are actually two timestamps at play. Your initial proposal
> > suggested using the timestamp from the registration znode. However, this
> > changes each time the broker loses its session. It does not reflect the
> > actual broker start time, which seems to be what you are interested in. So
> > possibly it is worth exposing both the true broker start time as well as
> > its session creation time.
> >
> > -Jason
> >
> >
> >
> > On Mon, Nov 18, 2019 at 7:48 AM Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Hi Noa,
> > >
> > > I understand the desire for batching. However, once you do that, you 
> > > either
> > > need request forwarding or broker metadata propagation. It's worth
> > > exploring, but I suspect you will end up with most of the changes needed 
> > > by
> > > the original proposal, in that case.
> > >
> > > Ismael
> > >
> > > On Mon, Nov 18, 2019 at 7:07 AM Noa Resare <n...@resare.com> wrote:
> > >
> > > > Hi Jason & Gwen,
> > > >
> > > > Sure, this would solve our use case. I do have two questions, however:
> > > >
> > > > Moving from start time to uptime in theory would neatly side step the
> > > > clock skew problem,
> > > > but in practice I’m not sure it helps that much as the straightforward
> > > way
> > > > of going about
> > > > implementing this by returning (now - startTime) would break when the
> > > > clock on the broker
> > > > is adjusted. So, I think that startTime is more straight forward and
> > > > doesn’t hide the fact that
> > > > the system clock is sometimes off.
> > > >
> > > > Another thing I think would be useful is to build support for requesting
> > > > multiple (or all?)
> > > > brokers in a single request at the start. Having the request hold a set
> > > of
> > > > brokerIds
> > > > and return a set of brokers in the response, adding a brokerId field to
> > > > identify which
> > > > broker a specific broker status response is associated with seems
> > > straight
> > > > forward.
> > > >
> > > > I’ll start writing a proposal so that we have something concrete to
> > > > discuss.
> > > >
> > > > /noa
> > > >
> > > > > On 13 Nov 2019, at 16:43, Jason Gustafson <ja...@confluent.io> wrote:
> > > > >
> > > > > Ni Noa,
> > > > >
> > > > > Thanks for the KIP. I agree with the concerns about Metadata bloat. In
> > > > > fact, I have wanted a BrokerStatus API for a while now. Perhaps this
> > > is a
> > > > > good excuse to introduce it. I was thinking something like this:
> > > > >
> > > > > BrokerStatusRequest => BrokerId
> > > > >
> > > > > BrokerStatusResponse =>
> > > > >  Listeners => [ListenerName Host Port]
> > > > >  RackId
> > > > >  BrokerEpoch
> > > > >  BrokerState => {ACTIVE, RECOVERING, SHUTTING_DOWN}
> > > > >  UpTime
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Tue, Nov 5, 2019 at 12:25 PM Noa Resare <n...@resare.com> wrote:
> > > > >
> > > > >> I agree with that. And looking at the MetadataResponse fields it 
> > > > >> seems
> > > > >> there has been some feature creep already here. Does the client use
> > > rack
> > > > >> information, for example?
> > > > >>
> > > > >> I guess one could do this either by introducing a new leaner message
> > > > pair,
> > > > >> used specifically enable client operation, and use
> > > > >> MetadataRequest/MetadataResponse for describeCluster() or one could
> > > > shrink
> > > > >> MetadataRequest/MetadataResponse and introduce a new more fully
> > > featured
> > > > >> message pair for the other stuff. I would be happy to spend some time
> > > > >> looking into implementing this if there is an interest.
> > > > >>
> > > > >> /noa
> > > > >>
> > > > >>> On 5 Nov 2019, at 15:43, Gwen Shapira <g...@confluent.io> wrote:
> > > > >>>
> > > > >>> It isn't just about saving space. It increases complexity to default
> > > to
> > > > >>> always sharing a bit of information that is really only needed in a
> > > > >> single
> > > > >>> use-case.
> > > > >>> We avoid doing this as a matter of good protocol design.
> > > > >>> Arguably, this should not really piggyback on cluster metadata at
> > > all,
> > > > >>> since the usage is so different.
> > > > >>>
> > > > >>> On Tue, Nov 5, 2019, 7:29 AM Noa Resare <n...@resare.com> wrote:
> > > > >>>
> > > > >>>> It would certainly be possible to have the field be optional and
> > > only
> > > > >>>> include it if some flag is set in the DescribeClusterOptions
> > > instance
> > > > >>>> passed to Admin.describeCluster() that in turn would translate to a
> > > > >> boolean
> > > > >>>> in MetadataRequest indicating that we are asking for this piece of
> > > > >>>> information.
> > > > >>>>
> > > > >>>> I’m not entirely sure that this extra complexity would be worth it
> > > for
> > > > >> the
> > > > >>>> modestly smaller response size, in a message that already contains
> > > > >> things
> > > > >>>> like the hostname and rack identifier per broker.
> > > > >>>>
> > > > >>>> /noa
> > > > >>>>
> > > > >>>>> On 4 Nov 2019, at 14:45, Gwen Shapira <g...@confluent.io> wrote:
> > > > >>>>>
> > > > >>>>> Cluster metadata is sent to clients on a very regular basis. 
> > > > >>>>> Adding
> > > > >>>>> start-time there seems quite repetitive. Especially considering
> > > that
> > > > >>>>> this information is only useful in very specific cases.
> > > > >>>>>
> > > > >>>>> Can we add this capability to the Admin API in a way that won't
> > > > impact
> > > > >>>>> normal client workflow?
> > > > >>>>>
> > > > >>>>> On Mon, Nov 4, 2019 at 4:05 AM Noa Resare <n...@resare.com> wrote:
> > > > >>>>>>
> > > > >>>>>> Thank you for the feedback, Stanislav!
> > > > >>>>>>
> > > > >>>>>> I agree that it would be good to harmonise the naming, and
> > > > >>>> start-time-ms definitely more descriptive.
> > > > >>>>>>
> > > > >>>>>> I have updated the proposal to reflect this, and also added the
> > > > >> updated
> > > > >>>> json RPC changes. Please have a look.
> > > > >>>>>>
> > > > >>>>>> /noa
> > > > >>>>>>
> > > > >>>>>>> On 1 Nov 2019, at 09:13, Stanislav Kozlovski <
> > > > stanis...@confluent.io
> > > > >>>
> > > > >>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>> Hey Noa,
> > > > >>>>>>>
> > > > >>>>>>> KIP-436 added a JMX metric in Kafka for this exact use-case,
> > > called
> > > > >>>>>>> `start-time-ms`. Perhaps it would be useful to name this public
> > > > >>>> interface
> > > > >>>>>>> in the same way for consistency.
> > > > >>>>>>>
> > > > >>>>>>> Could you update the KIP to include the specific RPC changes
> > > > >> regarding
> > > > >>>> the
> > > > >>>>>>> metadata request/responses? Here is a recent example of how to
> > > > >> portray
> > > > >>>> the
> > > > >>>>>>> changes -
> > > > >>>>>>>
> > > > >>>>
> > > > >>
> > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-525+-+Return+topic+metadata+and+configs+in+CreateTopics+response
> > > > >>>>>>>
> > > > >>>>>>> Thanks,
> > > > >>>>>>> Stanislav!
> > > > >>>>>>>
> > > > >>>>>>> On Mon, Oct 14, 2019 at 2:46 PM Noa Resare <n...@resare.com>
> > > wrote:
> > > > >>>>>>>
> > > > >>>>>>>> We are in the process of migrating the pieces of automation 
> > > > >>>>>>>> that
> > > > >>>> currently
> > > > >>>>>>>> reads and modifies zookeeper state to use the Admin API.
> > > > >>>>>>>>
> > > > >>>>>>>> One of the things that we miss doing this is access to the 
> > > > >>>>>>>> start
> > > > >> time
> > > > >>>> of
> > > > >>>>>>>> brokers in a cluster which is used by our automation doing
> > > rolling
> > > > >>>>>>>> restarts. We currently read this from the timestamp field from
> > > the
> > > > >>>>>>>> epehmeral broker znodes in zookeeper. To address this
> > > limitation,
> > > > I
> > > > >>>> have
> > > > >>>>>>>> authored KIP-536, that proposes adding a timestamp field to the
> > > > Node
> > > > >>>> class
> > > > >>>>>>>> that the AdminClient.describeCluster() method returns.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>
> > > > >>
> > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-536%3A+Propagate+broker+timestamp+to+Admin+API
> > > > >>>>>>>> <
> > > > >>>>>>>>
> > > > >>>>
> > > > >>
> > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-536:+Propagate+broker+timestamp+to+Admin+API
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Any and all feedback is most welcome
> > > > >>>>>>>>
> > > > >>>>>>>> cheers
> > > > >>>>>>>> noa
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Best,
> > > > >>>>>>> Stanislav
> > > > >>>>>>
> > > > >>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >

Reply via email to