Hi Ivan,
Thanks for the KIP.

The motivation of the KIP looks reasonable to me. It requires a way
for RSM providers to add custom metadata with the existing
RemoteLogSegmentMetadata. I remember we wanted to introduce a very
similar change in the earlier proposals called
RemoteLogMetadataContext. But we dropped that as we did not feel a
strong need at that time and we wanted to revisit it if needed. But I
see there is a clear need for this kind of custom metadata to keep
with RemoteLogSegmentMetadata.

It is better to introduce a new class for this custom metadata in
RemoteLogSegmentMetadata like below for any changes in the future.
RemoteLogSegmentMetadata will have this as an optional value.

public class RemoteLogSegmentMetadata {
...
public static class CustomMetadata {
     private final byte[] value;
    ...
}
...
}

This is completely opaque and it is the RSM implementation provider's
responsibility in serializing and deserializing the bytes. We can
introduce a property to guard the size with a configurable property
with a default value to avoid any unwanted large size values.

Thanks,
Satish.

On Tue, 30 May 2023 at 10:59, Ivan Yurchenko <ivan0yurche...@gmail.com> wrote:
>
> Hi all,
>
> I want to bring this to a conclusion (positive or negative), so if there
> are no more questions in a couple of days, I'll put the KIP to the vote.
>
> Best,
> Ivan
>
>
> On Fri, 5 May 2023 at 18:42, Ivan Yurchenko <ivan0yurche...@gmail.com>
> wrote:
>
> > Hi Alexandre,
> >
> > > combining custom
> > > metadata with rlmMetadata increases coupling between Kafka and the
> > > plugin.
> >
> > This is true. However, (if I understand your concern correctly,)
> > rlmMetadata in the current form may be independent from RSM plugins, but
> > data they point to are accessible only via the particular plugin (the one
> > that wrote the data or a compatible one). It seems, this coupling already
> > exists, but it is implicit. To make my point more concrete, imagine an S3
> > RSM which maps RemoteLogSegmentMetadata objects to S3 object keys. This
> > mapping logic is a part of the RSM plugin and without it the metadata is
> > useless. I think it will not get worse if (to follow the example) the
> > plugin makes the said S3 object keys explicit by adding them to the
> > metadata. From the high level point of view, moving the custom metadata to
> > a separate topic doesn't change the picture: it's still the plugin that
> > binds the standard and custom metadata together.
> >
> >
> > > For instance, the custom metadata may need to be modified
> > > outside of Kafka, but the rlmMetadata would still be cached on brokers
> > > independently of any update of custom metadata. Since both types of
> > > metadata are authored by different systems, and are cached in
> > > different layers, this may become a problem, or make plugin migration
> > > more difficult. What do you think?
> >
> > This is indeed a problem. I think a solution to this would be to clearly
> > state that metadata being modified outside of Kafka is out of scope and
> > instruct the plugin authors that custom metadata could be provided only
> > reactively from the copyLogSegmentData method and must remain immutable
> > after that. Does it make sense?
> >
> >
> > > Yes, you are right that the suggested alternative is to let the plugin
> > > store its own metadata separately with a solution chosen by the admin
> > > or plugin provider. For instance, it could be using a dedicated topic
> > > if chosen to, or relying on an external key-value store.
> >
> > I see. Yes, this option always exists and doesn't even require a KIP. The
> > biggest drawback I see is that a plugin will need to reimplement the
> > consumer/producer + caching mechanics that will exist on the broker side
> > for the standard remote metadata. I'd like to avoid this and this KIP is
> > the best solution I see.
> >
> > Best,
> > Ivan
> >
> >
> >
> > On Tue, 18 Apr 2023 at 13:02, Alexandre Dupriez <
> > alexandre.dupr...@gmail.com> wrote:
> >
> >> Hi Ivan,
> >>
> >> Thanks for the follow-up.
> >>
> >> Yes, you are right that the suggested alternative is to let the plugin
> >> store its own metadata separately with a solution chosen by the admin
> >> or plugin provider. For instance, it could be using a dedicated topic
> >> if chosen to, or relying on an external key-value store.
> >>
> >> I agree with you on the existing risks associated with running
> >> third-party code inside Apache Kafka. That said, combining custom
> >> metadata with rlmMetadata increases coupling between Kafka and the
> >> plugin. For instance, the custom metadata may need to be modified
> >> outside of Kafka, but the rlmMetadata would still be cached on brokers
> >> independently of any update of custom metadata. Since both types of
> >> metadata are authored by different systems, and are cached in
> >> different layers, this may become a problem, or make plugin migration
> >> more difficult. What do you think?
> >>
> >> I have a vague memory of this being discussed back when the tiered
> >> storage KIP was started. Maybe Satish has more background on this.
> >>
> >> Thanks,
> >> Alexandre
> >>
> >> Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko
> >> <ivan0yurche...@gmail.com> a écrit :
> >> >
> >> > Hi Alexandre,
> >> >
> >> > Thank you for your feedback!
> >> >
> >> > > One question I would have is, what is the benefit of adding these
> >> > > custom metadata in the rlmMetadata rather than letting the plugin
> >> > > manage access and persistence to them?
> >> >
> >> > Could you please elaborate? Do I understand correctly that the idea is
> >> that
> >> > the plugin will have its own storage for those custom metadata, for
> >> example
> >> > a special topic?
> >> >
> >> > > It would be possible for a user
> >> > > to use custom metadata large enough to adversely impact access to and
> >> > > caching of the rlmMetadata by Kafka.
> >> >
> >> > Since the custom metadata is 100% under control of the RSM plugin, the
> >> risk
> >> > is as big as the risk of running a third-party code (i.e. the RSM
> >> plugin).
> >> > The cluster admin must make the decision if they trust it.
> >> > To mitigate this risk and put it under control, the RSM plugin
> >> > implementations could document what custom metadata they use and
> >> estimate
> >> > their size.
> >> >
> >> > Best,
> >> > Ivan
> >> >
> >> >
> >> > On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez <
> >> alexandre.dupr...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Ivan,
> >> > >
> >> > > Thank you for the KIP.
> >> > >
> >> > > I think the KIP clearly explains the need for out-of-band metadata
> >> > > authored and used by an implementation of the remote storage manager.
> >> > > One question I would have is, what is the benefit of adding these
> >> > > custom metadata in the rlmMetadata rather than letting the plugin
> >> > > manage access and persistence to them?
> >> > >
> >> > > Maybe one disadvantage and potential risk with the approach proposed
> >> > > in the KIP is that the rlmMetadata is not of a predefined, relatively
> >> > > constant size (although corner cases with thousands of leader epochs
> >> > > in the leader epoch map are possible). It would be possible for a user
> >> > > to use custom metadata large enough to adversely impact access to and
> >> > > caching of the rlmMetadata by Kafka.
> >> > >
> >> > > Thanks,
> >> > > Alexandre
> >> > >
> >> > > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a écrit :
> >> > > >
> >> > > > I think it's a good idea as we may want to store remote segments in
> >> > > different buckets
> >> > > >
> >> > > >
> >> > > >
> >> > > > | |
> >> > > > hzhka...@163.com
> >> > > > |
> >> > > > |
> >> > > > 邮箱:hzhka...@163.com
> >> > > > |
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > ---- 回复的原邮件 ----
> >> > > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> |
> >> > > > | 日期 | 2023年04月06日 22:37 |
> >> > > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> |
> >> > > > | 抄送至 | |
> >> > > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for remote log
> >> > > segment |
> >> > > > Hello!
> >> > > >
> >> > > > I would like to start the discussion thread on KIP-917: Additional
> >> custom
> >> > > > metadata for remote log segment [1]
> >> > > > This KIP is fairly small and proposes to add a new field to the
> >> remote
> >> > > > segment metadata.
> >> > > >
> >> > > > Thank you!
> >> > > >
> >> > > > Best,
> >> > > > Ivan
> >> > > >
> >> > > > [1]
> >> > > >
> >> > >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment
> >> > >
> >>
> >

Reply via email to