Hello all, I would like to call a vote soon so if anyone wants to contribute to the discussion or bring up other concerns, you are welcome to comment.
Thanks, Olasoji -----Original Message----- From: Denloye, Olasoji <olasoji.denl...@intel.com> Sent: Wednesday, April 23, 2025 10:22 AM To: dev@kafka.apache.org Subject: RE: [DISCUSS] KIP-1121: Compression acceleration in Kafka Hi All, This is a gentle reminder about open discussion on this compression acceleration KIP. I would love to hear your thoughts. Thanks, Olasoji -----Original Message----- From: Denloye, Olasoji <olasoji.denl...@intel.com> Sent: Monday, April 7, 2025 8:49 AM To: dev@kafka.apache.org Subject: RE: [DISCUSS] KIP-1121: Compression acceleration in Kafka Hi Greg, I have refactored the proposal to reduce the API surface area. I introduced a new CompressionService interface which contains basic compression/decompression functions. A service provider, when supplied, would be wrapped in a concrete 'Compression' and run similar to the existing compression codecs. The 'Compression' interface and its subclasses have been decoupled from these changes. 6. based on your feedback, offloading would occur if and only if the specific configuration is set, even if multiple service providers are available 8. The threat model is that dynamically loading any untrusted or untested jar on the classpath can be dangerous. The purpose of bringing this up is to discuss how service provider (plugins) is distributed and installed. On one hand, a centralized Kafka repository and on the other a decentralized approach where plugins are developed and installed on the fly. Which is preferable? The choice would determine what security mitigations are needed if any. https://cwiki.apache.org/confluence/x/Xgr0Ew Thanks, Olasoji -----Original Message----- From: Greg Harris <greg.har...@aiven.io.INVALID> Sent: Monday, February 24, 2025 12:42 PM To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-1121: Compression acceleration in Kafka Hi Olasoji, Thanks for the update. 1. Could you add these as code examples to the KIP, and not as prose? See [1-3] for some examples of well-written public interfaces. 2. I find the proposed API surface area excessive, and think we need to narrow it down. Looking at what's merged on trunk right now and not including your proposed modifications, this KIP includes: * 2 interfaces * 12 classes * 5 constructors * 8 static methods * 2 static variables * 18 unique instance methods Committing to long-term stability of this many classes and interfaces as public API will make it hard to initially review it, and innovate on it in the future. 4. Is there a way to modify the existing signatures/implementations such that we don't have to expose any subclasses of Compression as public API, only the CompressionTypes? 6. What does "first provider loaded" mean to users? AFAIU, this looks arbitrary to users, as it depends on classpath ordering, filesystem ordering, etc. 8. I'm trying to understand under what conditions this can be attacked (threat model), and what mitigations Kafka needs to provide (implementation). Is Kafka providing special filtering? Is Kafka providing the custom class loader? 6, 7, 8. Please add these details to the KIP so that other reviewers don't need to follow the discussion thread so closely. 9. The testing section looks like it is written for individuals implementing accelerator plugins. Could you also write about how the Kafka side of the implementation will be tested? Thanks, Greg [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging [2] https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage [3] https://cwiki.apache.org/confluence/display/KAFKA/KIP-877%3A+Mechanism+for+plugins+and+connectors+to+register+metrics On Fri, Feb 21, 2025 at 12:15 AM Denloye, Olasoji <olasoji.denl...@intel.com> wrote: > Hi Greg, Thanks for your response. > > 1. Some changes will be made to the Compression and Builder > interfaces. At least one default method, isAvailable(), would be added > to the Builder interface. The static methods in the Compression > interface that create and return specific builders would be modified > to return a new builder or a loaded builder from a service provider if > it exists. The return types may need to be more generic e.g > Builder<GzipCompression> vs GzipCompression.Builder. > 2. These classes are appropriate to add to the API surface as they may > be required to develop efficient compression implementations. I will > modify the KIP. > 4. This is because the Compression interface (currently) expects a > Builder of type 'GzipCompression.Builder'. I agree that alternative > implementations need not extend the base implementation with some > modifications to the Compression interface. However if an alternative > implementation is registered as a service provider of GzipCompression > then it must be of type 'CompressionType.GZIP' and it must be fully > compatible with the base implementation of GzipCompression. > 5. So far no, but 'Configurable' looks promising 6. If no specific > configuration is set, then the first provider loaded and available > will be used. If a specific configuration is set then that specific > provider will be used if available else the base implementation. > 7. Yes, a ConfigException will be thrown if the class specified in the > config is not found on the classpath. > 8. This can be implemented by using special filtering or a custom > class loader. The concern here is that any class on the class path > (trusted or > not) could be loaded which could intercept messages or corrupt > messages by compressing using an unknown algorithm. > > -----Original Message----- > From: Greg Harris <greg.har...@aiven.io.INVALID> > Sent: Tuesday, February 18, 2025 1:58 PM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-1121: Compression acceleration in Kafka > > Hi Olasoji, > > Thanks for the KIP! I find it addresses many of the concerns I had > with KIP-984. Thank you for following up on that! > > 1. Since the Compression interface is internal currently, it may > contain some components that were not designed with pluggability in > mind, such as the static methods. Are these interfaces being published > as-is, or are some changes going to be made first? > 2. Similarly, the ByteBufferOutputStream, CompressionType, > BufferSupplier, ByteBufferInputStream, ChunkedByteStream, and > GzipOutputStream classes in the signature or implementations of > Compression are not public API currently, and would need to be made > public along with Compression. Are these classes appropriate to add to our > API surface? > 3. Could you adjust the KIP structure/formatting to make the public > API surface area more clear? Some prose in "Proposed Changes" could > become code examples in "Public Interfaces". > 4. What is the significance of this statement: "Any new service > provider should extend the base implementation and thus return the > same CompressionType". Are we expecting that `o instanceof > GzipCompression == o.type().equals(CompressionType.GZIP)` at all > times? If an alternative implementation doesn't need any part of the > upstream implementation, does it make sense to force them to extend one > another? > 5. Is this interface expected to be composed with any other plugin > mix-ins, such as Configurable, Reconfigurable, Closeable, Monitorable, > Versioned, etc? > 6. If there are multiple accelerators present, which one will be chosen? > 7. If the configuration is set but the accelerator implementation > isn't available, does that cause a ConfigException? > 8. Could you expand on this statement: "Due to security and/or > compatibility concerns it is possible to limit this feature to only > load approved service providers (e.g those distributed with Kafka)". > How will this be implemented? What threat model does this assume? > > Thanks, > Greg > > On Tue, Feb 18, 2025 at 11:41 AM Denloye, Olasoji < > olasoji.denl...@intel.com> > wrote: > > > Hi, I would like to bring this back to everyone's attention. This > > KIP introduces a simple framework that enables the use of > > alternative, faster implementations of the existing compression > > codecs (Gzip, Snappy, Zstandard and LZ4) in Kafka. > > > > https://cwiki.apache.org/confluence/x/Xgr0Ew > > > > Thanks > > > > -----Original Message----- > > From: Denloye, Olasoji <olasoji.denl...@intel.com> > > Sent: Thursday, February 13, 2025 1:22 PM > > To: dev@kafka.apache.org > > Subject: RE: [DISCUSS] KIP-1121: Compression acceleration in Kafka > > > > Hi, I would like to bring this back to everyone's attention. I have > > made changes to the proposal to incorporate the ServiceLoader API as > suggested. > > > > https://cwiki.apache.org/confluence/x/Xgr0Ew > > > > Thanks > > > > -----Original Message----- > > From: Denloye, Olasoji > > Sent: Thursday, December 19, 2024 11:01 AM > > To: dev@kafka.apache.org > > Subject: RE: [DISCUSS] KIP-1121: Compression acceleration in Kafka > > > > Hi Frédérik, Thanks for the suggestion to use the existing > > ServiceLoader feature in Java. It could serve as suitable mechanism > > for selecting the best available compression codec. > > > > Thanks > > > > -----Original Message----- > > From: Frédérik Rouleau <froul...@confluent.io.INVALID> > > Sent: Wednesday, December 18, 2024 12:24 AM > > To: dev@kafka.apache.org > > Subject: Re: [DISCUSS] KIP-1121: Compression acceleration in Kafka > > > > Hi Olasoji, > > > > I think that having a way to provide some optimised/accelerated > > implementation is a good idea. > > What not using the ServiceLoader feature of Java? That would allow > > anyone to provide their own optimised implementation depending on > > their infrastructure. We can keep the existing implementation as the > > default and allow Kafka to find a better implementation based on > > Service found. As you mentioned, we might need to enrich the > > interface to allow Kafka to find the best version available at runtime. > > > > Regards, > > > > On Tue, Dec 17, 2024 at 6:36 PM Denloye, Olasoji < > > olasoji.denl...@intel.com> > > wrote: > > > > > Hi Everyone, > > > > > > I would like to start a discussion on KIP-1121: Compression > > > acceleration acceleration in Kafka. KIP-1121: Compression > > > acceleration in Kafka - Apache Kafka - Apache Software Foundation< > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1121%3A+Comp > > > re > > > ss > > > ion+acceleration+in+Kafka > > > > > > > > > >