Hi Satish,

Thank you for your feedback.

I've nothing against going from Map<String, byte[]> to byte[].
Serialization should not be a problem for RSM implementations: `Struct`,
`Schema` and other useful serde classes are distributed as a part of the
kafka-clients library.

Also a good idea to add the size limiting setting, some
`remote.log.metadata.custom.metadata.max.size`. A sensible default may be
10 KB, which is enough to host a struct with 10 long (almost) 1K symbol
ASCII strings.

If a piece of custom metadata exceeds the limit, the execution of
RLMTask.copyLogSegmentsToRemote should be interrupted with an error message.

Does this sound good?
If so, I'll update the KIP accordingly. And I think it may be time for the
vote after that.

Best,
Ivan



On Sat, 3 Jun 2023 at 17:13, Satish Duggana <satish.dugg...@gmail.com>
wrote:

> Hi Ivan,
> Thanks for the KIP.
>
> The motivation of the KIP looks reasonable to me. It requires a way
> for RSM providers to add custom metadata with the existing
> RemoteLogSegmentMetadata. I remember we wanted to introduce a very
> similar change in the earlier proposals called
> RemoteLogMetadataContext. But we dropped that as we did not feel a
> strong need at that time and we wanted to revisit it if needed. But I
> see there is a clear need for this kind of custom metadata to keep
> with RemoteLogSegmentMetadata.
>
> It is better to introduce a new class for this custom metadata in
> RemoteLogSegmentMetadata like below for any changes in the future.
> RemoteLogSegmentMetadata will have this as an optional value.
>
> public class RemoteLogSegmentMetadata {
> ...
> public static class CustomMetadata {
>      private final byte[] value;
>     ...
> }
> ...
> }
>
> This is completely opaque and it is the RSM implementation provider's
> responsibility in serializing and deserializing the bytes. We can
> introduce a property to guard the size with a configurable property
> with a default value to avoid any unwanted large size values.
>
> Thanks,
> Satish.
>
> On Tue, 30 May 2023 at 10:59, Ivan Yurchenko <ivan0yurche...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I want to bring this to a conclusion (positive or negative), so if there
> > are no more questions in a couple of days, I'll put the KIP to the vote.
> >
> > Best,
> > Ivan
> >
> >
> > On Fri, 5 May 2023 at 18:42, Ivan Yurchenko <ivan0yurche...@gmail.com>
> > wrote:
> >
> > > Hi Alexandre,
> > >
> > > > combining custom
> > > > metadata with rlmMetadata increases coupling between Kafka and the
> > > > plugin.
> > >
> > > This is true. However, (if I understand your concern correctly,)
> > > rlmMetadata in the current form may be independent from RSM plugins,
> but
> > > data they point to are accessible only via the particular plugin (the
> one
> > > that wrote the data or a compatible one). It seems, this coupling
> already
> > > exists, but it is implicit. To make my point more concrete, imagine an
> S3
> > > RSM which maps RemoteLogSegmentMetadata objects to S3 object keys. This
> > > mapping logic is a part of the RSM plugin and without it the metadata
> is
> > > useless. I think it will not get worse if (to follow the example) the
> > > plugin makes the said S3 object keys explicit by adding them to the
> > > metadata. From the high level point of view, moving the custom
> metadata to
> > > a separate topic doesn't change the picture: it's still the plugin that
> > > binds the standard and custom metadata together.
> > >
> > >
> > > > For instance, the custom metadata may need to be modified
> > > > outside of Kafka, but the rlmMetadata would still be cached on
> brokers
> > > > independently of any update of custom metadata. Since both types of
> > > > metadata are authored by different systems, and are cached in
> > > > different layers, this may become a problem, or make plugin migration
> > > > more difficult. What do you think?
> > >
> > > This is indeed a problem. I think a solution to this would be to
> clearly
> > > state that metadata being modified outside of Kafka is out of scope and
> > > instruct the plugin authors that custom metadata could be provided only
> > > reactively from the copyLogSegmentData method and must remain immutable
> > > after that. Does it make sense?
> > >
> > >
> > > > Yes, you are right that the suggested alternative is to let the
> plugin
> > > > store its own metadata separately with a solution chosen by the admin
> > > > or plugin provider. For instance, it could be using a dedicated topic
> > > > if chosen to, or relying on an external key-value store.
> > >
> > > I see. Yes, this option always exists and doesn't even require a KIP.
> The
> > > biggest drawback I see is that a plugin will need to reimplement the
> > > consumer/producer + caching mechanics that will exist on the broker
> side
> > > for the standard remote metadata. I'd like to avoid this and this KIP
> is
> > > the best solution I see.
> > >
> > > Best,
> > > Ivan
> > >
> > >
> > >
> > > On Tue, 18 Apr 2023 at 13:02, Alexandre Dupriez <
> > > alexandre.dupr...@gmail.com> wrote:
> > >
> > >> Hi Ivan,
> > >>
> > >> Thanks for the follow-up.
> > >>
> > >> Yes, you are right that the suggested alternative is to let the plugin
> > >> store its own metadata separately with a solution chosen by the admin
> > >> or plugin provider. For instance, it could be using a dedicated topic
> > >> if chosen to, or relying on an external key-value store.
> > >>
> > >> I agree with you on the existing risks associated with running
> > >> third-party code inside Apache Kafka. That said, combining custom
> > >> metadata with rlmMetadata increases coupling between Kafka and the
> > >> plugin. For instance, the custom metadata may need to be modified
> > >> outside of Kafka, but the rlmMetadata would still be cached on brokers
> > >> independently of any update of custom metadata. Since both types of
> > >> metadata are authored by different systems, and are cached in
> > >> different layers, this may become a problem, or make plugin migration
> > >> more difficult. What do you think?
> > >>
> > >> I have a vague memory of this being discussed back when the tiered
> > >> storage KIP was started. Maybe Satish has more background on this.
> > >>
> > >> Thanks,
> > >> Alexandre
> > >>
> > >> Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko
> > >> <ivan0yurche...@gmail.com> a écrit :
> > >> >
> > >> > Hi Alexandre,
> > >> >
> > >> > Thank you for your feedback!
> > >> >
> > >> > > One question I would have is, what is the benefit of adding these
> > >> > > custom metadata in the rlmMetadata rather than letting the plugin
> > >> > > manage access and persistence to them?
> > >> >
> > >> > Could you please elaborate? Do I understand correctly that the idea
> is
> > >> that
> > >> > the plugin will have its own storage for those custom metadata, for
> > >> example
> > >> > a special topic?
> > >> >
> > >> > > It would be possible for a user
> > >> > > to use custom metadata large enough to adversely impact access to
> and
> > >> > > caching of the rlmMetadata by Kafka.
> > >> >
> > >> > Since the custom metadata is 100% under control of the RSM plugin,
> the
> > >> risk
> > >> > is as big as the risk of running a third-party code (i.e. the RSM
> > >> plugin).
> > >> > The cluster admin must make the decision if they trust it.
> > >> > To mitigate this risk and put it under control, the RSM plugin
> > >> > implementations could document what custom metadata they use and
> > >> estimate
> > >> > their size.
> > >> >
> > >> > Best,
> > >> > Ivan
> > >> >
> > >> >
> > >> > On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez <
> > >> alexandre.dupr...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi Ivan,
> > >> > >
> > >> > > Thank you for the KIP.
> > >> > >
> > >> > > I think the KIP clearly explains the need for out-of-band metadata
> > >> > > authored and used by an implementation of the remote storage
> manager.
> > >> > > One question I would have is, what is the benefit of adding these
> > >> > > custom metadata in the rlmMetadata rather than letting the plugin
> > >> > > manage access and persistence to them?
> > >> > >
> > >> > > Maybe one disadvantage and potential risk with the approach
> proposed
> > >> > > in the KIP is that the rlmMetadata is not of a predefined,
> relatively
> > >> > > constant size (although corner cases with thousands of leader
> epochs
> > >> > > in the leader epoch map are possible). It would be possible for a
> user
> > >> > > to use custom metadata large enough to adversely impact access to
> and
> > >> > > caching of the rlmMetadata by Kafka.
> > >> > >
> > >> > > Thanks,
> > >> > > Alexandre
> > >> > >
> > >> > > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a écrit :
> > >> > > >
> > >> > > > I think it's a good idea as we may want to store remote
> segments in
> > >> > > different buckets
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > | |
> > >> > > > hzhka...@163.com
> > >> > > > |
> > >> > > > |
> > >> > > > 邮箱:hzhka...@163.com
> > >> > > > |
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > ---- 回复的原邮件 ----
> > >> > > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> |
> > >> > > > | 日期 | 2023年04月06日 22:37 |
> > >> > > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> |
> > >> > > > | 抄送至 | |
> > >> > > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for remote
> log
> > >> > > segment |
> > >> > > > Hello!
> > >> > > >
> > >> > > > I would like to start the discussion thread on KIP-917:
> Additional
> > >> custom
> > >> > > > metadata for remote log segment [1]
> > >> > > > This KIP is fairly small and proposes to add a new field to the
> > >> remote
> > >> > > > segment metadata.
> > >> > > >
> > >> > > > Thank you!
> > >> > > >
> > >> > > > Best,
> > >> > > > Ivan
> > >> > > >
> > >> > > > [1]
> > >> > > >
> > >> > >
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment
> > >> > >
> > >>
> > >
>

Reply via email to