Re: Filtering support on Fetch API

Igor Soarez Mon, 06 Dec 2021 07:53:56 -0800

Hi Talat,

Have you considered using 10x more topics - perhaps using multiple clusters - 
and avoid having to do any filtering in the clients?


--
Igor

On Tue, Nov 30, 2021, at 8:16 PM, Talat Uyarer wrote:
> Hi Eric,
>
> Thanks for your comments. My goal is apply filter without any
> serialization.
>
> I will generate headers distinct values on Record Batch in producer. Broker
> will build an index for header values like as timeindex. When consumer
> apply filter broker will filter only record batch level. Filter will not
> guarantee exact results. but it will reduce cost consumer side. Consumer
> still needs to do whatever it does but for less amount of messages.
>
> Do you see any issue ? In this model I think we dont have any penalty
> except creating additional index file on broker and increase storage size
> little bit.
>
> Thanks
>
> On Tue, Nov 30, 2021 at 10:21 AM Eric Azama <eazama...@gmail.com> wrote:
>
>> Something to keep in mind with your proposal is that you're moving the
>> Decompression and Filtering costs into the Brokers. It probably also adds a
>> new Compression cost if you want the Broker to send compressed data over
>> the network. Centralizing that cost on the cluster may not be desirable and
>> would likely increase latency across the board.
>>
>> Additionally, because header values are byte arrays, the Brokers probably
>> would not be able to do very sophisticated filtering. Support for basic
>> comparisons of the built-in Serdes might be simple enough, but anything
>> more complex or involving custom Serdes would probably require a new
>> plug-in type on the broker.
>>
>> On Mon, Nov 29, 2021 at 10:49 AM Talat Uyarer <
>> tuya...@paloaltonetworks.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > I want to get your advice about one subject. I want to create a KIP for
>> > message header base filtering on Fetch API.
>> >
>> > Our current use case We have 1k+ topics and per topic, have 10+ consumers
>> > for different use cases. However all consumers are interested in
>> different
>> > sets of messages on the same topic. Currently  We read all messages from
>> a
>> > given topic and drop logs on the consumer side. To reduce our stream
>> > processing cost I want to drop logs on the broker side. So far my
>> > understanding
>> >
>> > *Broker send messages as is (No serilization cost) -> Network Transfer ->
>> > > Consumer Deserialize Messages(User side deserilization cost) -> User
>> > Space
>> > > drop or use messages (User Sidefiltering cost)*
>> >
>> >
>> > If I can drop messages based on their headers without serialization and
>> > deserialization messages. It will help us save network bandwidth and as
>> > well as consumer side cpu cost.
>> >
>> > My approach is building a header index. Consumer clients will define
>> > their filter in the fetch call. If the filter is matching, the broker
>> will
>> > send the messages. I would like to hear your suggestions about my
>> solution.
>> >
>> > Thanks
>> >
>>

Re: Filtering support on Fetch API

Reply via email to