Luke, Thanks for the review!
DelayedFetch and DelayedRemoteFetch are orthogonal calls <https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1559> . Only one of them will be active in a given FETCH request. The fetch request might take more than 500 ms when the time taken to read data from remote storage exceeds 500 ms and `remote.fetch.max.wait.ms` is configured higher than 500 ms. -- Kamal On Fri, May 3, 2024 at 1:55 PM Luke Chen <show...@gmail.com> wrote: > Hi Kamal, > > Thanks for the KIP! > Sorry for the late review. > > Overall LGTM! Just 1 question: > > If one fetch request contains 2 partitions: [p1, p2] > fetch.max.wait.ms: 500, remote.fetch.max.wait.ms: 1000 > > And now, p1 fetch offset is the log end offset and has no new data coming, > and p2 fetch offset is to fetch from remote storage. > And suppose the fetch from remote storage takes 1000ms. > So, question: > Will this fetch request return in 500ms or 1000ms? > And what will be returned? > > I think before this change, it'll return within 500ms, right? > But it's not clear what behavior it will be after this KIP. > > Thank you. > Luke > > > On Fri, May 3, 2024 at 1:56 PM Kamal Chandraprakash < > kamal.chandraprak...@gmail.com> wrote: > > > Christo, > > > > We have localTimeMs, remoteTimeMs, and totalTimeMs as part of the > > FetchConsumer request metric. > > > > > > > kafka.network:type=RequestMetrics,name={LocalTimeMs|RemoteTimeMs|TotalTimeMs},request={Produce|FetchConsumer|FetchFollower} > > > > RemoteTimeMs refers to the amount of time spent in the purgatory for > normal > > fetch requests > > and amount of time spent in reading the remote data for remote-fetch > > requests. Do we want > > to have a separate `TieredStorageTimeMs` to capture the time spent in > > remote-read requests? > > > > With per-broker level timer metrics combined with the request level > > metrics, the user will have > > sufficient information. > > > > Metric name = > > > > > kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs > > > > -- > > Kamal > > > > On Mon, Apr 29, 2024 at 1:38 PM Christo Lolov <christolo...@gmail.com> > > wrote: > > > > > Heya! > > > > > > Is it difficult to instead add the metric at > > > kafka.network:type=RequestMetrics,name=TieredStorageMs (or some other > > > name=*)? Alternatively, if it is difficult to add it there, is it > > possible > > > to add 2 metrics, one at the RequestMetrics level (even if it is > > > total-time-ms - (all other times)) and one at what you are proposing? > As > > an > > > operator I would find it strange to not see the metric in the > > > RequestMetrics. > > > > > > Your thoughts? > > > > > > Best, > > > Christo > > > > > > On Sun, 28 Apr 2024 at 10:52, Kamal Chandraprakash < > > > kamal.chandraprak...@gmail.com> wrote: > > > > > > > Christo, > > > > > > > > Updated the KIP with the remote fetch latency metric. Please take > > another > > > > look! > > > > > > > > -- > > > > Kamal > > > > > > > > On Sun, Apr 28, 2024 at 12:23 PM Kamal Chandraprakash < > > > > kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > Hi Federico, > > > > > > > > > > Thanks for the suggestion! Updated the config name to " > > > > > remote.fetch.max.wait.ms". > > > > > > > > > > Christo, > > > > > > > > > > Good point. We don't have the remote-read latency metrics to > measure > > > the > > > > > performance of the remote read requests. I'll update the KIP to > emit > > > this > > > > > metric. > > > > > > > > > > -- > > > > > Kamal > > > > > > > > > > > > > > > On Sat, Apr 27, 2024 at 4:03 PM Federico Valeri < > > fedeval...@gmail.com> > > > > > wrote: > > > > > > > > > >> Hi Kamal, it looks like all TS configurations starts with > "remote." > > > > >> prefix, so I was wondering if we should name it > > > > >> "remote.fetch.max.wait.ms". > > > > >> > > > > >> On Fri, Apr 26, 2024 at 7:07 PM Kamal Chandraprakash > > > > >> <kamal.chandraprak...@gmail.com> wrote: > > > > >> > > > > > >> > Hi all, > > > > >> > > > > > >> > If there are no more comments, I'll start a vote thread by > > tomorrow. > > > > >> > Please review the KIP. > > > > >> > > > > > >> > Thanks, > > > > >> > Kamal > > > > >> > > > > > >> > On Sat, Mar 30, 2024 at 11:08 PM Kamal Chandraprakash < > > > > >> > kamal.chandraprak...@gmail.com> wrote: > > > > >> > > > > > >> > > Hi all, > > > > >> > > > > > > >> > > Bumping the thread. Please review this KIP. Thanks! > > > > >> > > > > > > >> > > On Thu, Feb 1, 2024 at 9:11 PM Kamal Chandraprakash < > > > > >> > > kamal.chandraprak...@gmail.com> wrote: > > > > >> > > > > > > >> > >> Hi Jorge, > > > > >> > >> > > > > >> > >> Thanks for the review! Added your suggestions to the KIP. > PTAL. > > > > >> > >> > > > > >> > >> The `fetch.max.wait.ms` config will be also applicable for > > > topics > > > > >> > >> enabled with remote storage. > > > > >> > >> Updated the description to: > > > > >> > >> > > > > >> > >> ``` > > > > >> > >> The maximum amount of time the server will block before > > answering > > > > the > > > > >> > >> fetch request > > > > >> > >> when it is reading near to the tail of the partition > > > > >> (high-watermark) and > > > > >> > >> there isn't > > > > >> > >> sufficient data to immediately satisfy the requirement given > by > > > > >> > >> fetch.min.bytes. > > > > >> > >> ``` > > > > >> > >> > > > > >> > >> -- > > > > >> > >> Kamal > > > > >> > >> > > > > >> > >> On Thu, Feb 1, 2024 at 12:12 AM Jorge Esteban Quilcate Otoya > < > > > > >> > >> quilcate.jo...@gmail.com> wrote: > > > > >> > >> > > > > >> > >>> Hi Kamal, > > > > >> > >>> > > > > >> > >>> Thanks for this KIP! It should help to solve one of the main > > > > issues > > > > >> with > > > > >> > >>> tiered storage at the moment that is dealing with individual > > > > >> consumer > > > > >> > >>> configurations to avoid flooding logs with interrupted > > > exceptions. > > > > >> > >>> > > > > >> > >>> One of the topics discussed in [1][2] was on the semantics > of > > ` > > > > >> > >>> fetch.max.wait.ms` and how it's affected by remote storage. > > > > Should > > > > >> we > > > > >> > >>> consider within this KIP the update of `fetch.max.wail.ms` > > docs > > > > to > > > > >> > >>> clarify > > > > >> > >>> it only applies to local storage? > > > > >> > >>> > > > > >> > >>> Otherwise, LGTM -- looking forward to see this KIP adopted. > > > > >> > >>> > > > > >> > >>> [1] https://issues.apache.org/jira/browse/KAFKA-15776 > > > > >> > >>> [2] > > > > >> > https://github.com/apache/kafka/pull/14778#issuecomment-1820588080 > > > > >> > >>> > > > > >> > >>> On Tue, 30 Jan 2024 at 01:01, Kamal Chandraprakash < > > > > >> > >>> kamal.chandraprak...@gmail.com> wrote: > > > > >> > >>> > > > > >> > >>> > Hi all, > > > > >> > >>> > > > > > >> > >>> > I have opened a KIP-1018 > > > > >> > >>> > < > > > > >> > >>> > > > > > >> > >>> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests > > > > >> > >>> > > > > > > >> > >>> > to introduce dynamic max-remote-fetch-timeout broker > config > > to > > > > >> give > > > > >> > >>> more > > > > >> > >>> > control to the operator. > > > > >> > >>> > > > > > >> > >>> > > > > > >> > >>> > > > > > >> > >>> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests > > > > >> > >>> > > > > > >> > >>> > Let me know if you have any feedback or suggestions. > > > > >> > >>> > > > > > >> > >>> > -- > > > > >> > >>> > Kamal > > > > >> > >>> > > > > > >> > >>> > > > > >> > >> > > > > >> > > > > > > > > > > > > > > >