Hi Talat, Have you considered using 10x more topics - perhaps using multiple clusters - and avoid having to do any filtering in the clients?
-- Igor On Tue, Nov 30, 2021, at 8:16 PM, Talat Uyarer wrote: > Hi Eric, > > Thanks for your comments. My goal is apply filter without any > serialization. > > I will generate headers distinct values on Record Batch in producer. Broker > will build an index for header values like as timeindex. When consumer > apply filter broker will filter only record batch level. Filter will not > guarantee exact results. but it will reduce cost consumer side. Consumer > still needs to do whatever it does but for less amount of messages. > > Do you see any issue ? In this model I think we dont have any penalty > except creating additional index file on broker and increase storage size > little bit. > > Thanks > > On Tue, Nov 30, 2021 at 10:21 AM Eric Azama <eazama...@gmail.com> wrote: > >> Something to keep in mind with your proposal is that you're moving the >> Decompression and Filtering costs into the Brokers. It probably also adds a >> new Compression cost if you want the Broker to send compressed data over >> the network. Centralizing that cost on the cluster may not be desirable and >> would likely increase latency across the board. >> >> Additionally, because header values are byte arrays, the Brokers probably >> would not be able to do very sophisticated filtering. Support for basic >> comparisons of the built-in Serdes might be simple enough, but anything >> more complex or involving custom Serdes would probably require a new >> plug-in type on the broker. >> >> On Mon, Nov 29, 2021 at 10:49 AM Talat Uyarer < >> tuya...@paloaltonetworks.com> >> wrote: >> >> > Hi All, >> > >> > I want to get your advice about one subject. I want to create a KIP for >> > message header base filtering on Fetch API. >> > >> > Our current use case We have 1k+ topics and per topic, have 10+ consumers >> > for different use cases. However all consumers are interested in >> different >> > sets of messages on the same topic. Currently We read all messages from >> a >> > given topic and drop logs on the consumer side. To reduce our stream >> > processing cost I want to drop logs on the broker side. So far my >> > understanding >> > >> > *Broker send messages as is (No serilization cost) -> Network Transfer -> >> > > Consumer Deserialize Messages(User side deserilization cost) -> User >> > Space >> > > drop or use messages (User Sidefiltering cost)* >> > >> > >> > If I can drop messages based on their headers without serialization and >> > deserialization messages. It will help us save network bandwidth and as >> > well as consumer side cpu cost. >> > >> > My approach is building a header index. Consumer clients will define >> > their filter in the fetch call. If the filter is matching, the broker >> will >> > send the messages. I would like to hear your suggestions about my >> solution. >> > >> > Thanks >> > >>