Hi Xinyu,

Thanks for the proposal!
I agree abstracting the log and logSegment layers can allow users to
implement any kinds of storage mediums.

Some high-level comments:
1. I expect to see the future plan of this KIP, but can't find it.
What do you expect to achieve the "shared storage" goal after the Unified
Shared Storage KIP?
Maybe "Replication protocol API change" for phase 2? Maybe "Useful tools to
interact with shared storage" for next phase? I don't know, but just want
to know what is the full design in your thought?

2. Why is the "Stream layer" optional? Currently, in each LogSegment, it
contains FileRecords, indexes, producer snapshots,... etc implementation in
the file-based storage. After this KIP, there is no implementation for
SharedLogSegment, right? If we don't have Stream Layer, how do users rely
on the sharedLog abstraction to develop for their own storage?

3. Following (2), because we abstract the log layer now, we need all the
detailed information to instruct users to implement what they desire.
Suppose that is the Stream Layer, I expect there will be more detailed
explanation about it, like:
(a) What are the methods in the interface doing? When will they be invoked?
Why ...
(b) When and how will the interface interact with shareLogSegment? Like
when will the meta stream come into play?
(c) What is the motivation about storing the metadata in KRaft, instead of
another internal topic or creating another interface or something else?
(d) The detail of messages, i.e. GET_KVS, PUT_KVS, ... etc.


Thank you.
Luke

On Tue, May 13, 2025 at 8:22 PM Xinyu Zhou <yu...@apache.org> wrote:

> Dear Kafka Community,
>
> I am proposing a new KIP to introduce a unified shared storage
> solution for Kafka, aiming
> to enhance its scalability and flexibility. This KIP is inspired by
> the ongoing discussions
> around KIP-1150 and KIP-1176, which explore leveraging object storage
> to achieve cost and
> elasticity benefits. These efforts are commendable, but given the
> widespread adoption of
> Kafka's classic shared-nothing architecture, especially in on-premise
> environments, we
> need a unified approach that supports a smooth transition from
> shared-nothing to shared
> storage. This KIP proposes refactoring the log layer to support both
> architectures
> simultaneously, ensuring long-term compatibility and allowing Kafka to
> fully leverage
> shared storage services like S3, HDFS, and NFS.
>
> The core of this proposal includes introducing abstract log and log
> segment classes and a
> new 'Stream' API to bridge the gap between shared storage services and
> Kafka's storage
> layer. This unified solution will enable Kafka to evolve while
> maintaining backward
> compatibility, supporting both on-premise and cloud deployments. I
> believe this approach
> is crucial for Kafka's continued success and look forward to your
> thoughts and feedback.
>
>
> Link to the KIP for more details:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1183%3A+Unified+Shared+Storage
>
> Best regards,
>
> Xinyu
>

Reply via email to