[
https://issues.apache.org/jira/browse/KAFKA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786242#comment-16786242
]
Yiming Zang commented on KAFKA-6020:
------------------------------------
Any updates for this?
We have smilier needs on our side, strongly support this idea on broker-side
filtering.
Our use case comes from N-DC replication. Basically imagine if you have 5 data
centers and you need to replicate data to everywhere, typically you'll have to
run N*(N-1) which is 20 mirror-maker jobs in order replicate messages in each
local data center to all remote data centers. Each mirror maker will have to
read the whole 5 copies of events, do some processing and only replicate one
fifth of the events. This is a huge waste of network bandwidth and cpu
resources. If we can have a way to pre filter the events on broker side, mirror
maker doesn't need to read all 5 copies of events any more, which can be a huge
amount of savings when we have even more data centers in the future.
> Broker side filtering
> ---------------------
>
> Key: KAFKA-6020
> URL: https://issues.apache.org/jira/browse/KAFKA-6020
> Project: Kafka
> Issue Type: New Feature
> Components: consumer
> Reporter: Pavel Micka
> Priority: Major
> Labels: needs-kip
>
> Currently, it is not possible to filter messages on broker side. Filtering
> messages on broker side is convenient for filter with very low selectivity
> (one message in few thousands). In my case it means to transfer several GB of
> data to consumer, throw it away, take one message and do it again...
> While I understand that filtering by message body is not feasible (for
> performance reasons), I propose to filter just by message key prefix. This
> can be achieved even without any deserialization, as the prefix to be matched
> can be passed as an array (hence the broker would do just array prefix
> compare).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)