Re: [DISCUSS] KIP-1002: Fetch remote segment indexes at once

Jorge Esteban Quilcate Otoya Mon, 13 Nov 2023 23:30:03 -0800

Divij, thanks for your prompt feedback!

1. Agree, caching at the plugin level was my initial idea as well; though,
keeping two caches for the same data both at the broker and at the plugin
seems wasteful. (added this as a rejected alternative in the meantime)


2. Not necessarially. The API allows to request a set of indexes. In the
case of the `RemoteIndexCache`, as it's currently implemented, it would be
using: [offset, time, transaction] index types.

However, I see your point that there may be scenarios where only 1 of the 3
indexes are used:
- Time index used mostly once when fetching sequentially by seeking offset
by time.
- Offset and Transaction indexes are probably the only ones that make sense
to cache as are used on every fetch.
Arguably, Transaction indexes are not as common, reducing the benefits of
the proposed approach:
from initially expecting to fetch 3 indexes at once, to potentially
fetching only 2 (offset, txn), but most probably fetching 1 (offset).

If there's value perceived from fetching Offset and Transaction together,
we can keep discussing this KIP. In the meantime, I will look into the
approach to lazily fetch indexes while waiting for additional feedback.

Cheers,
Jorge.

On Mon, 13 Nov 2023 at 16:51, Divij Vaidya <divijvaidy...@gmail.com> wrote:

> Hi Jorge
>
> 1. I don't think we need a new API here because alternatives solutions
> exist even with the current API. As an example, when the first index is
> fetched, the RSM plugin can choose to download all indexes and cache it
> locally. On the next call to fetch an index from the remote tier, we will
> hit the cache and retrieve the index from there.
>
> 2. The KIP assumes that all indexes are required at all times. However,
> indexes such as transaction indexes are only required for read_committed
> fetches and time index is only required when a fetch call wants to search
> offset by timestamp. As a future step in Tiered Storage, I would actually
> prefer to move towards a direction where we are lazily fetching indexes
> on-demand instead of fetching them together as proposed in the KIP.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Nov 10, 2023 at 4:00 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hello everyone,
> >
> > I would like to start the discussion on a KIP for Tiered Storage. It's
> > about improving cross-segment latencies by reducing calls to fetch
> indexes
> > individually.
> > Have a look:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once
> >
> > Cheers,
> > Jorge
> >
>

Re: [DISCUSS] KIP-1002: Fetch remote segment indexes at once

Reply via email to