Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Colin McCabe Mon, 06 Jan 2020 12:08:09 -0800

Hi Brian,

Thanks for continuing to work on this.  It looks good overall.


It's probably better to keep the discussion on this thread rather than going 
back and forth between this and the DISCUSS thread.

The KIP states:

 > For (1), an RPC is generated every time an uncached topic's metadata must 
 > be fetched. During periods when a large number of uncached topics are 
 > processed (e.g. producer startup), a large number of RPCs may be sent out 
 > to the controller in a short period of time

Metadata requests don't (always) go to the controller, right?  We should fix 
the wording in this section.

I feel like "Proposed Changes" should come before "Public Interfaces" here.  
The new configuration won't make sense to the reader until he or she has read 
the "changes" section.  Also, it's not clear from the name that "metadata 
evict" refers to a span of time.  What do you think about 
"metadata.eviction.period.ms" as a configuration name?

 > Let metadataRefreshSecs = metadata.max.age.ms / 1000
>
 > Set topicsPerSec = <number of cached topics> / metadataRefreshSecs
>
 > Set targetMetadataFetchSize = Math.max(topicsPerSec * 10, 20)
>
 > Rationale: this sets the target size to be approximate a metadata refresh 
 > at least every 10 seconds, while also maintaining a reasonable batch size 
 > of '20' for setups with a lower number of topics. '20' has no significance 
 > other than it's a small-but-appropriate trade-off between RPC metadata 
 > response size and necessary RPC frequency.

Where is the "10 seconds" coming from here?  The current default for 
metadata.max.age.ms is 5 * 60 * 1000 ms, which implies that we want to refresh 
every 5 minutes.  Definitely not every 10 seconds.

Stepping back a little bit, it seems like the big problem you identified is the 
O(N^2) behavior of producing to X, then Y, then Z, etc. etc. where each new 
produce to a fresh topic triggers a metadata request with all the preceding 
topics included.  

Of course we need to send out a metadata request before producing to X, then Y, 
then Z, but that request could just specify X, or just specify Y, etc. etc.  In 
other words, we could decouple decouple the routine metadata fetch which 
happens on a 5 minute timer from the need to fetch metadata for something 
specific right now.

I guess my question is, do we really need to allow routine metadata fetches to 
"piggyback" on the emergency metadata fetches?  It adds a lot of complexity, 
and we don't have any benchmarks proving that it's better.  Also, as I 
understand it, whether we piggyback or not, the number of metadata fetches is 
the same, right?

best,
Colin


On Mon, Jan 6, 2020, at 10:26, Lucas Bradstreet wrote:
> +1 (non-binding)
> 
> On Thu, Jan 2, 2020 at 11:15 AM Brian Byrne <[email protected]> wrote:
> 
> > Hello all,
> >
> > After further discussion and improvements, I'd like to reinstate the voting
> > process.
> >
> > The updated KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics>
> >
> > The continued discussion:
> >
> > https://lists.apache.org/thread.html/b2f8f830ef04587144cf0840c7d4811bbf0a14f3c459723dbc5acf9e@%3Cdev.kafka.apache.org%3E
> >
> > I'd be happy to address any further comments/feedback.
> >
> > Thanks,
> > Brian
> >
> > On Mon, Dec 9, 2019 at 11:02 PM Guozhang Wang <[email protected]> wrote:
> >
> > > With the concluded summary on the other discussion thread, I'm +1 on the
> > > proposal.
> > >
> > > Thanks Brian!
> > >
> > > On Tue, Nov 19, 2019 at 8:00 PM deng ziming <[email protected]>
> > > wrote:
> > >
> > > > >
> > > > > For new (uncached) topics, one problem here is that we don't know
> > which
> > > > > partition to map a record to in the event that it has a key or custom
> > > > > partitioner, so the RecordAccumulator wouldn't know which
> > batch/broker
> > > it
> > > > > belongs. We'd need an intermediate record queue that subsequently
> > moved
> > > > the
> > > > > records into RecordAccumulators once metadata resolution was
> > complete.
> > > > For
> > > > > known topics, we don't currently block at all in waitOnMetadata.
> > > > >
> > > >
> > > > You are right, I forget this fact, and the intermediate record queue
> > will
> > > > help, but I have some questions
> > > >
> > > > if we add an intermediate record queue in KafkaProducer, when should we
> > > > move the records into RecordAccumulators?
> > > > only NetworkClient is aware of the MetadataResponse, here is the
> > > > hierarchical structure of the related classes:
> > > > KafkaProducer
> > > >     Accumulator
> > > >     Sender
> > > >         NetworkClient
> > > >             metadataUpdater.handleCompletedMetadataResponse
> > > >
> > > > so
> > > > 1. we should also add a metadataUpdater to KafkaProducer?
> > > > 2. if the topic really does not exists? the intermediate record queue
> > > will
> > > > become too large?
> > > > 3. and should we `block` when the intermediate record queue is too
> > large?
> > > > and this will again bring the blocking problem?
> > > >
> > > >
> > > >
> > > > On Wed, Nov 20, 2019 at 12:40 AM Brian Byrne <[email protected]>
> > > wrote:
> > > >
> > > > > Hi Deng,
> > > > >
> > > > > Thanks for the feedback.
> > > > >
> > > > > On Mon, Nov 18, 2019 at 6:56 PM deng ziming <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > hi, I reviewed the current code, the ProduceMetadata maintains an
> > > > expiry
> > > > > > threshold for every topic, every time when we write to a topic we
> > > will
> > > > > set
> > > > > > the expiry time to -1 to indicate it should be updated, this does
> > > work
> > > > to
> > > > > > reduce the size of the topic working set, but the producer will
> > > > continue
> > > > > > fetching metadata for these topics in every metadata request for
> > the
> > > > full
> > > > > > expiry duration.
> > > > > >
> > > > >
> > > > > Indeed, you are correct, I terribly misread the code here.
> > Fortunately
> > > > this
> > > > > was only a minor optimization in the KIP that's no longer necessary.
> > > > >
> > > > >
> > > > > and we can improve the situation by 2 means:
> > > > > >     1. we maintain a refresh threshold for every topic which is for
> > > > > example
> > > > > > 0.8 * expiry_threshold, and when we send `MetadataRequest` to
> > brokers
> > > > we
> > > > > > just request unknownLeaderTopics + unknownPartitionTopics + topics
> > > > > > reach refresh threshold.
> > > > > >
> > > > >
> > > > > Right, this is similar to what I suggested, with a larger window on
> > the
> > > > > "staleness" that permits for batching to an appropriate size (except
> > if
> > > > > there's any unknown topics, you'd want to issue the request
> > > immediately).
> > > > >
> > > > >
> > > > >
> > > > > >     2. we don't invoke KafkaProducer#waitOnMetadata when we call
> > > > > > KafkaProducer#send because of we just send data to
> > RecordAccumulator,
> > > > and
> > > > > > before we send data to brokers we will invoke
> > > > RecordAccumulator#ready(),
> > > > > so
> > > > > > we can only invoke waitOnMetadata to block when (number topics
> > > > > > reach refresh threshold)>(number of all known topics)*0.2.
> > > > > >
> > > > >
> > > > > For new (uncached) topics, one problem here is that we don't know
> > which
> > > > > partition to map a record to in the event that it has a key or custom
> > > > > partitioner, so the RecordAccumulator wouldn't know which
> > batch/broker
> > > it
> > > > > belongs. We'd need an intermediate record queue that subsequently
> > moved
> > > > the
> > > > > records into RecordAccumulators once metadata resolution was
> > complete.
> > > > For
> > > > > known topics, we don't currently block at all in waitOnMetadata.
> > > > >
> > > > > The last major point of minimizing producer startup metadata RPCs may
> > > > still
> > > > > need to be improved, but this would be a large improvement on the
> > > current
> > > > > situation.
> > > > >
> > > > > Thanks,
> > > > > Brian
> > > > >
> > > > >
> > > > >
> > > > > > I think the above 2 ways are enough to solve the current problem.
> > > > > >
> > > > > > On Tue, Nov 19, 2019 at 3:20 AM Colin McCabe <[email protected]>
> > > > wrote:
> > > > > >
> > > > > > > On Mon, Nov 18, 2019, at 10:05, Brian Byrne wrote:
> > > > > > > > On Fri, Nov 15, 2019 at 5:08 PM Colin McCabe <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Two seconds doesn't seem like a reasonable amount of time to
> > > > leave
> > > > > > for
> > > > > > > the
> > > > > > > > > metadata fetch.  Fetching halfway through the expiration
> > period
> > > > > seems
> > > > > > > more
> > > > > > > > > reasonable.  It also doesn't require us to create a new
> > > > > configuration
> > > > > > > key,
> > > > > > > > > which is nice.
> > > > > > > > >
> > > > > > > > > Another option is to just do the metadata fetch every
> > > > > > > metadata.max.age.ms,
> > > > > > > > > but not expire the topic until we can't fetch the metadata
> > for
> > > 2
> > > > *
> > > > > > > > > metadata.max.age.ms.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'd expect two seconds to be reasonable in the common case.
> > Keep
> > > in
> > > > > > mind
> > > > > > > > that this doesn't affect correctness, and a control operation
> > > > > returning
> > > > > > > > cached metadata should be on the order of milliseconds.
> > > > > > > >
> > > > > > >
> > > > > > > Hi Brian,
> > > > > > >
> > > > > > > Thanks again for the KIP.
> > > > > > >
> > > > > > > I think the issue here is not the common case, but the uncommon
> > > case
> > > > > > where
> > > > > > > the metadata fetch takes longer than expected.  In that case, we
> > > > don't
> > > > > > want
> > > > > > > to be in the position of having our metadata expire because we
> > > waited
> > > > > too
> > > > > > > long to renew it.
> > > > > > >
> > > > > > > This is one reason why I think that the metadata expiration time
> > > > should
> > > > > > be
> > > > > > > longer than the metadata refresh time.  In fact, it might be
> > worth
> > > > > having
> > > > > > > two separate configuration keys for these two values.  I could
> > > > imagine
> > > > > a
> > > > > > > user who is having trouble with metadata expiration wanting to
> > > > increase
> > > > > > the
> > > > > > > metadata expiration time, but without increasing the metadata
> > > refresh
> > > > > > > period.  In a sense, the metadata expiration time is like the ZK
> > > > > session
> > > > > > > expiration time.  You might want to turn it up if the cluster is
> > > > > > > experiencing load spikes.
> > > > > > >
> > > > > > > >
> > > > > > > > But to the general
> > > > > > > > point, defining the algorithm would mean enforcing it to fair
> > > > > accuracy,
> > > > > > > > whereas if the suggestion is that it'll be performed at a
> > > > reasonable
> > > > > > > time,
> > > > > > > > it allows for batching and other optimizations. Perhaps I
> > > shouldn't
> > > > > be
> > > > > > > > regarding what's defined in a KIP to be contractual in these
> > > cases,
> > > > > but
> > > > > > > you
> > > > > > > > could consider a first implementation to collect topics whose
> > > > > metadata
> > > > > > > has
> > > > > > > > exceeded (metadata.max.age.ms / 2), and sending the batch
> > once a
> > > > > > > > constituent topic's metadata is near the expiry, or a
> > sufficient
> > > > > number
> > > > > > > of
> > > > > > > > topics have been collected (10? 100? 1000?).
> > > > > > > >
> > > > > > >
> > > > > > > I'm concerned that if we change the metadata caching strategy
> > > without
> > > > > > > discussing it first, it may improve certain workloads but make
> > > others
> > > > > > > worse.  We need to be concrete about what the proposed strategy
> > is
> > > so
> > > > > > that
> > > > > > > we can really evaluate it.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > We should be specific about what happens if the first few
> > > > metadata
> > > > > > > fetches
> > > > > > > > > fail.  Do we use exponential backoff to decide when to
> > resend?
> > > > It
> > > > > > > seems
> > > > > > > > > like we really should, for all the usual reasons (reduce the
> > > load
> > > > > on
> > > > > > > > > brokers, ride out temporary service disruptions, etc.)  Maybe
> > > we
> > > > > > could
> > > > > > > have
> > > > > > > > > an exponential retry backoff for each broker (in other words,
> > > we
> > > > > > > should try
> > > > > > > > > to contact a different broker before applying the backoff.)
> > I
> > > > > think
> > > > > > > this
> > > > > > > > > already sort of happens with the disconnect timeout, but we
> > > might
> > > > > > need
> > > > > > > a
> > > > > > > > > more general solution.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I don't plan to change this behavior. Currently it retries
> > after
> > > a
> > > > > > fixed
> > > > > > > > value of 'retry.backoff.ms' (defaults to 100 ms). It's
> > possible
> > > > that
> > > > > > > > different brokers are attempted, but I haven't dug into it.
> > > > > > > >
> > > > > > >
> > > > > > > I think it's critical to understand what the current behavior is
> > > > before
> > > > > > we
> > > > > > > try to change it.  The difference between retrying the same
> > broker
> > > > and
> > > > > > > trying a different one has a large impact it has on cluster load
> > > and
> > > > > > > latency.  For what it's worth, I believe the behavior is the
> > second
> > > > > one,
> > > > > > > but it has been a while since I checked.  Let's figure this out.
> > > > > > >
> > > > > > > >
> > > > > > > > > Thanks for the clarification.  Fully asynchronous is the way
> > to
> > > > > go, I
> > > > > > > > > agree.  I'm having trouble understanding how timeouts are
> > > handled
> > > > > in
> > > > > > > the
> > > > > > > > > KIP.  It seems like if we can't fetch the metadata within the
> > > > > > > designated
> > > > > > > > > metadata timeout, the future / callback should receive a
> > > > > > > TimeoutException
> > > > > > > > > right?  We do not want the send call to be deferred forever
> > if
> > > > > > metadata
> > > > > > > > > can't be fetched.  Eventually it should fail if it can't be
> > > > > > performed.
> > > > > > > > >
> > > > > > > > > I do think this is something that will have to be mentioned
> > in
> > > > the
> > > > > > > > > compatibility section.  There is some code out there that is
> > > > > probably
> > > > > > > > > prepared to handle a timeout exception from the send
> > function,
> > > > > which
> > > > > > > may
> > > > > > > > > need to be updated to check for a timeout from the future or
> > > > > > callback.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Correct, a timeout exception would be delivered in the future.
> > > > Sure,
> > > > > I
> > > > > > > can
> > > > > > > > add that note to the KIP.
> > > > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > It seems like this is an existing problem.  You may fire off
> > a
> > > > lot
> > > > > of
> > > > > > > send
> > > > > > > > > calls that get blocked because the broker that is the leader
> > > for
> > > > a
> > > > > > > certain
> > > > > > > > > partition is not responding.  I'm not sure that we need to do
> > > > > > anything
> > > > > > > > > special here.  On the other hand, we could make the case for
> > a
> > > > > > generic
> > > > > > > "max
> > > > > > > > > number of outstanding sends" configuration to prevent
> > surprise
> > > > OOMs
> > > > > > in
> > > > > > > the
> > > > > > > > > existing cases, plus the new one we're adding.  But this
> > feels
> > > > > like a
> > > > > > > bit
> > > > > > > > > of a scope expansion.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Right, this is an existing problem, however the asynchronous
> > send
> > > > > could
> > > > > > > > cause unexpected behavior. For example, if a client pinned
> > > > > > > > topics/partitions to individual send threads, then memory
> > > couldn't
> > > > be
> > > > > > > > exhausted by a single topic since a blocking send would prevent
> > > > > further
> > > > > > > > records from being buffered on that topic. The compromise could
> > > be
> > > > > that
> > > > > > > we
> > > > > > > > only ever permit one outstanding record batch for a topic,
> > which
> > > > > would
> > > > > > > keep
> > > > > > > > the code simple and wouldn't permit a single topic to consume
> > all
> > > > > > > available
> > > > > > > > memory.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > They may be connected, but I'm not sure they should be the
> > > same.
> > > > > > > Perhaps
> > > > > > > > > expiry should be 4x the typical fetch rate, for example.
> > > > > > > > >
> > > > > > > >
> > > > > > > > That's true. You could also make the case for a faster expiry
> > > than
> > > > > > > refresh
> > > > > > > > as well. Makes sense to separate this out.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Hmm.... are you sure this is an N^2 problem?  If you have T1
> > > and
> > > > > T2,
> > > > > > > you
> > > > > > > > > fetch metadata for T1 and T2, right?  I guess you could argue
> > > > that
> > > > > we
> > > > > > > often
> > > > > > > > > fetch metadata for partitions we don't care about, but that
> > > > doesn't
> > > > > > > make it
> > > > > > > > > O(N^2).  Maybe there's something about the implementation
> > that
> > > > I'm
> > > > > > > missing.
> > > > > > > > >
> > > > > > > >
> > > > > > > > My apologies, I left out the context. One issue the KIP is
> > trying
> > > > to
> > > > > > > > resolve is the metadata storm that's caused by producers
> > starting
> > > > up.
> > > > > > In
> > > > > > > > the worst case, where the send call is only performed from a
> > > single
> > > > > > > thread
> > > > > > > > (i.e. no possible batching), fetching metadata for 1K topics
> > will
> > > > > > > generate
> > > > > > > > 1K RPCs, with payload 1+2+...+1K topics' metadata. Being smart
> > > > about
> > > > > > the
> > > > > > > > topics being refreshed would still generate 1K RPCs for 1 topic
> > > > each,
> > > > > > and
> > > > > > > > asynchronous behavior would permit batching (note steady-state
> > > > > > refreshing
> > > > > > > > doesn't require the asynchronous behavior to batch).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > In general, we need to take advantage of batching to do this
> > > well
> > > > > > (one
> > > > > > > > > reason why I think we should steer clear of ultra-granular
> > > > timeout
> > > > > > > > > tracking).  It's best to do 1 RPC asking for 10 topics, not
> > 10
> > > > RPCs
> > > > > > > asking
> > > > > > > > > for a single topic each, even if that means some of the topic
> > > > > > timeouts
> > > > > > > are
> > > > > > > > > not *exactly* aligned with the configured value.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Absolutely, agreed.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Brian
> > > > > > > >
> > > > > > > >
> > > > > > > > > best,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Brian
> > > > > > > > > >
> > > > > > > > > > On Mon, Nov 11, 2019 at 11:47 AM Colin McCabe <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Brian,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP.
> > > > > > > > > > >
> > > > > > > > > > > Starting the metadata fetch before we need the result is
> > > > > > > definitely a
> > > > > > > > > > > great idea.  This way, the metadata fetch can be done in
> > > > > parallel
> > > > > > > with
> > > > > > > > > all
> > > > > > > > > > > the other stuff e producer is doing, rather than forcing
> > > the
> > > > > > > producer
> > > > > > > > > to
> > > > > > > > > > > periodically come to a halt periodically while metadata
> > is
> > > > > > fetched.
> > > > > > > > > > >
> > > > > > > > > > > Maybe I missed it, but there seemed to be some details
> > > > missing
> > > > > > > here.
> > > > > > > > > When
> > > > > > > > > > > do we start the metadata fetch?  For example, if topic
> > > > metadata
> > > > > > > expires
> > > > > > > > > > > every 5 minutes, perhaps we should wait 4 minutes, then
> > > > > starting
> > > > > > > > > fetching
> > > > > > > > > > > the new metadata, which we would expect to arrive by the
> > 5
> > > > > minute
> > > > > > > > > > > deadline.  Or perhaps we should start the fetch even
> > > earlier,
> > > > > > > around
> > > > > > > > > the
> > > > > > > > > > > 2.5 minute mark.  In any case, there should be some
> > > > discussion
> > > > > > > about
> > > > > > > > > what
> > > > > > > > > > > the actual policy is.  Given that metadata.max.age.ms is
> > > > > > > configurable,
> > > > > > > > > > > maybe that policy  needs to be expressed in terms of a
> > > > > percentage
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > refresh period rather than in terms of an absolute delay.
> > > > > > > > > > >
> > > > > > > > > > > The KIP correctly points out that the current metadata
> > > > fetching
> > > > > > > policy
> > > > > > > > > > > causes us to "[block] in a function that's advertised as
> > > > > > > asynchronous."
> > > > > > > > > > > However, the KIP doesn't seem to spell out whether we
> > will
> > > > > > > continue to
> > > > > > > > > > > block if metadata can't be found, or if this will be
> > > > abolished.
> > > > > > > > > Clearly,
> > > > > > > > > > > starting the metadata fetch early will reduce blocking in
> > > the
> > > > > > > common
> > > > > > > > > case,
> > > > > > > > > > > but will there still be blocking in the uncommon case
> > where
> > > > the
> > > > > > > early
> > > > > > > > > fetch
> > > > > > > > > > > doesn't succeed in time?
> > > > > > > > > > >
> > > > > > > > > > >  > To address (2), the producer currently maintains an
> > > expiry
> > > > > > > threshold
> > > > > > > > > > > for
> > > > > > > > > > >  > every topic, which is used to remove a topic from the
> > > > > working
> > > > > > > set
> > > > > > > > > at a
> > > > > > > > > > >  > future time (currently hard-coded to 5 minutes, this
> > > > should
> > > > > be
> > > > > > > > > modified
> > > > > > > > > > > to
> > > > > > > > > > >  > use metadata.max.age.ms). While this does work to
> > > reduce
> > > > > the
> > > > > > > size
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > >  > topic working set, the producer will continue fetching
> > > > > > metadata
> > > > > > > for
> > > > > > > > > > > these
> > > > > > > > > > >  > topics in every metadata request for the full expiry
> > > > > duration.
> > > > > > > This
> > > > > > > > > > > logic
> > > > > > > > > > >  > can be made more intelligent by managing the expiry
> > from
> > > > > when
> > > > > > > the
> > > > > > > > > topic
> > > > > > > > > > >  > was last used, enabling the expiry duration to be
> > > reduced
> > > > to
> > > > > > > improve
> > > > > > > > > > > cases
> > > > > > > > > > >  > where a large number of topics are touched
> > > intermittently.
> > > > > > > > > > >
> > > > > > > > > > > Can you clarify this part a bit?  It seems like we have a
> > > > > > metadata
> > > > > > > > > > > expiration policy now for topics, and we will have one
> > > after
> > > > > this
> > > > > > > KIP,
> > > > > > > > > but
> > > > > > > > > > > they will be somewhat different.  But it's not clear to
> > me
> > > > what
> > > > > > the
> > > > > > > > > > > differences are.
> > > > > > > > > > >
> > > > > > > > > > > In general, if load is a problem, we should probably
> > > consider
> > > > > > > adding
> > > > > > > > > some
> > > > > > > > > > > kind of jitter on the client side.  There are definitely
> > > > cases
> > > > > > > where
> > > > > > > > > people
> > > > > > > > > > > start up a lot of clients at the same time in parallel
> > and
> > > > > there
> > > > > > > is a
> > > > > > > > > > > thundering herd problem with metadata updates.  Adding
> > > jitter
> > > > > > would
> > > > > > > > > spread
> > > > > > > > > > > the load across time rather than creating a spike every 5
> > > > > minutes
> > > > > > > in
> > > > > > > > > this
> > > > > > > > > > > case.
> > > > > > > > > > >
> > > > > > > > > > > best,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Nov 8, 2019, at 08:59, Ismael Juma wrote:
> > > > > > > > > > > > I think this KIP affects when we block which is
> > actually
> > > > user
> > > > > > > visible
> > > > > > > > > > > > behavior. Right?
> > > > > > > > > > > >
> > > > > > > > > > > > Ismael
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Nov 8, 2019, 8:50 AM Brian Byrne <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Guozhang,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding metadata expiry, no access times other than
> > > the
> > > > > > > initial
> > > > > > > > > > > lookup[1]
> > > > > > > > > > > > > are used for determining when to expire producer
> > > > metadata.
> > > > > > > > > Therefore,
> > > > > > > > > > > > > frequently used topics' metadata will be aged out and
> > > > > > > subsequently
> > > > > > > > > > > > > refreshed (in a blocking manner) every five minutes,
> > > and
> > > > > > > > > infrequently
> > > > > > > > > > > used
> > > > > > > > > > > > > topics will be retained for a minimum of five minutes
> > > and
> > > > > > > currently
> > > > > > > > > > > > > refetched on every metadata update during that time
> > > > period.
> > > > > > The
> > > > > > > > > > > sentence is
> > > > > > > > > > > > > suggesting that we could reduce the expiry time to
> > > > improve
> > > > > > the
> > > > > > > > > latter
> > > > > > > > > > > > > without affecting (rather slightly improving) the
> > > former.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Keep in mind that in most all cases, I wouldn't
> > > > anticipate
> > > > > > > much of
> > > > > > > > > a
> > > > > > > > > > > > > difference with producer behavior, and the extra
> > logic
> > > > can
> > > > > be
> > > > > > > > > > > implemented
> > > > > > > > > > > > > to have insignificant cost. It's the large/dynamic
> > > topic
> > > > > > corner
> > > > > > > > > cases
> > > > > > > > > > > that
> > > > > > > > > > > > > we're trying to improve.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It'd be convenient if the KIP is no longer necessary.
> > > > > You're
> > > > > > > right
> > > > > > > > > in
> > > > > > > > > > > that
> > > > > > > > > > > > > there's no public API changes and the behavioral
> > > changes
> > > > > are
> > > > > > > > > entirely
> > > > > > > > > > > > > internal. I'd be happy to continue the discussion
> > > around
> > > > > the
> > > > > > > KIP,
> > > > > > > > > but
> > > > > > > > > > > > > unless otherwise objected, it can be retired.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] Not entirely accurate, it's actually the first
> > time
> > > > > when
> > > > > > > the
> > > > > > > > > client
> > > > > > > > > > > > > calculates whether to retain the topic in its
> > metadata.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Brian
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Nov 7, 2019 at 4:48 PM Guozhang Wang <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hello Brian,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Could you elaborate a bit more on this sentence:
> > > "This
> > > > > > logic
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > made
> > > > > > > > > > > > > > more intelligent by managing the expiry from when
> > the
> > > > > topic
> > > > > > > was
> > > > > > > > > last
> > > > > > > > > > > > > used,
> > > > > > > > > > > > > > enabling the expiry duration to be reduced to
> > improve
> > > > > cases
> > > > > > > > > where a
> > > > > > > > > > > large
> > > > > > > > > > > > > > number of topics are touched intermittently." Not
> > > sure
> > > > I
> > > > > > > fully
> > > > > > > > > > > understand
> > > > > > > > > > > > > > the proposal.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Also since now this KIP did not make any public API
> > > > > changes
> > > > > > > and
> > > > > > > > > the
> > > > > > > > > > > > > > behavioral changes are not considered a public API
> > > > > contract
> > > > > > > (i.e.
> > > > > > > > > > > how we
> > > > > > > > > > > > > > maintain the topic metadata in producer cache is
> > > never
> > > > > > > committed
> > > > > > > > > to
> > > > > > > > > > > > > users),
> > > > > > > > > > > > > > I wonder if we still need a KIP for the proposed
> > > change
> > > > > any
> > > > > > > more?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Nov 7, 2019 at 12:43 PM Brian Byrne <
> > > > > > > [email protected]
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hello all,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'd like to propose a vote for a producer change
> > to
> > > > > > improve
> > > > > > > > > > > producer
> > > > > > > > > > > > > > > behavior when dealing with a large number of
> > > topics,
> > > > in
> > > > > > > part by
> > > > > > > > > > > > > reducing
> > > > > > > > > > > > > > > the amount of metadata fetching performed.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The full KIP is provided here:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > And the discussion thread:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/b2f8f830ef04587144cf0840c7d4811bbf0a14f3c459723dbc5acf9e@%3Cdev.kafka.apache.org%3E
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>

Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Reply via email to