On Tue, Nov 21, 2017, at 18:56, Jun Rao wrote:
> Hi, Colin,
> 
> Thanks for the KIP. A few comments below.
> 
> 1. Currently, if num.replica.fetchers is configured with a value larger
> than 1, a broker will be using multiple fetcher threads to fetch from
> another broker. So, there will be multiple concurrent fetch requests from
> a given follower broker. Generating the UUID just based on the replica id
> probably won't be enough in this case.

Very good point.  Each fetcher thread should have its own UUID.  Each
fetcher thread also has its own disjoint set of partitions that it is
interested in.

And then, like you said, we don't have to key off of broker IDs at all. 
We probably should still enforce the invariant that we don't cache more
than one set of partition offsets per broker per partition.

> 2. It's not very clear to me how the follower knows when to include the
> skipped partitions again when new data is published to those partitions
> in the future.

Hmm-- I hope I'm not misinterpreting the question.  But the idea is that
the leader knows what partitions each incremental fetch request is
referring to-- they are always implicitly included in the request and
the results.

> 3. When replica.fetch.response.max.bytes is exceeded, the leader stops
> giving data back for the remaining partitions in the fetch response. In
> that case, we don't want to skip those partitions with empty data in the
> IncrementalFetchRequest since they may actually have data.
>
> 4. Similar to #3, if replication throttling is enabled, it's possible for
> the leader to give empty data in a partition even though the partition
> has new data. In the case, it's not clear if the follower should blindly skip
> that partition in the IncrementalFetchRequest.

It seems like if data about a partition cannot be returned, either
because of throttling or maximum message sizes, the leader can just
defer sending back that data until the next incremental fetch occurs. 
This seems pretty similar to the current situation with full fetch
requests that can't return all the available data...

> 5. Currently, the leader maintains a _lastCaughtUpTimeMs. If a follower
> stops fetching a partition and _lastCaughtUpTimeMs is not updated, the
> follower will fall out of ISR. So, will the leader remember all the
> partitions in the last full fetch request so that it can keep
> updating _lastCaughtUpTimeMs when serving IncrementalFetchRequest?

Fetching partition data via an incremental fetch request should advance
lastCaughtUpTimeMs.

best,
Colin

> 
> Jun
> 
> 
> On Tue, Nov 21, 2017 at 1:02 PM, Colin McCabe <cmcc...@apache.org> wrote:
> 
> > Hi all,
> >
> > I created a KIP to improve the scalability and latency of FetchRequest:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 227%3A+Introduce+Incremental+FetchRequests+to+Increase+
> > Partition+Scalability
> >
> > Please take a look.
> >
> > cheers,
> > Colin
> >

Reply via email to