[jira] [Commented] (KAFKA-1655) Allow high performance SimpleConsumer use cases to still work with new Kafka 0.9 consumer APIs

Valentin (JIRA) Sun, 28 Sep 2014 15:49:58 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151249#comment-14151249
 ]


Valentin commented on KAFKA-1655:
---------------------------------

mailing list thread which led to the creation of this ticket:
{code}
On Sat, 27 Sep 2014 08:31:01 -0700, Jun Rao <jun...@gmail.com> wrote:
> Valentin,
> 
> That's a good point. We don't have this use case in mind when designing the
> new consumer api. A straightforward implementation could be removing the
> locally cached topic metadata for unsubscribed topics. It's probably
> possible to add a config value to avoid churns in caching the metadata.
> Could you file a jira so that we can track this?
> 
> Thanks,
> 
> Jun
> 
> On Thu, Sep 25, 2014 at 4:19 AM, Valentin <kafka-9999...@sblk.de> wrote:
> 
>>
>> Hi Jun, Hi Guozhang,
>>
>> hm, yeah, if the subscribe/unsubscribe is a smart and lightweight
>> operation this might work. But if it needs to do any additional calls to
>> fetch metadata during a subscribe/unsubscribe call, the overhead could
>> get
>> quite problematic. The main issue I still see here is that an additional
>> layer is added which does not really provide any benefit for a use case
>> like mine.
>> I.e. the leader discovery and connection handling you mention below don't
>> really offer value in this case, as for the connection pooling approach
>> suggested, I will have to discover and maintain leader metadata in my own
>> code anyway as well as handling connection pooling. So if I understand
>> the
>> current plans for the Kafka 0.9 consumer correctly, it just doesn't work
>> well for my use case. Sure, there are workarounds to make it work in my
>> scenario, but I doubt any of them would scale as well as my current
>> SimpleConsumer approach :|
>> Or am I missing something here?
>>
>> Greetings
>> Valentin
>>
>> On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <jun...@gmail.com> wrote:
>> > Valentin,
>> >
>> > As Guozhang mentioned, to use the new consumer in the SimpleConsumer
>> way,
>> > you would subscribe to a set of topic partitions and the issue poll().
>> You
>> > can change subscriptions on every poll since it's cheap. The benefit
>> > you
>> > get is that it does things like leader discovery and maintaining
>> > connections to the leader automatically for you.
>> >
>> > In any case, we will leave the old consumer including the
>> > SimpleConsumer
>> > for sometime even after the new consumer is out.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Tue, Sep 23, 2014 at 12:23 PM, Valentin <kafka-9999...@sblk.de>
>> wrote:
>> >
>> >> Hi Jun,
>> >>
>> >> yes, that would theoretically be possible, but it does not scale at
>> all.
>> >>
>> >> I.e. in the current HTTP REST API use case, I have 5 connection pools
>> on
>> >> every tomcat server (as I have 5 brokers) and each connection pool
>> holds
>> >> upto 10 SimpleConsumer connections. So all in all I get a maximum of
>> >> 50
>> >> open connections per web application server. And with that I am able
>> >> to
>> >> handle most requests from HTTP consumers without having to open/close
>> >> any new connections to a broker host.
>> >>
>> >> If I would now do the same implementation with the new Kafka 0.9 high
>> >> level consumer, I would end up with >1000 connection pools (as I have
>> >> >1000 topic partitions) and each of these connection pools would
>> contain
>> >> a number of consumer connections. So all in all, I would end up with
>> >> thousands of connection objects per application server. Not really a
>> >> viable approach :|
>> >>
>> >> Currently I am wondering what the rationale is for deprecating the
>> >> SimpleConsumer API, if there are use cases which just work much better
>> >> using it.
>> >>
>> >> Greetings
>> >> Valentin
>> >>
>> >> On 23/09/14 18:16, Guozhang Wang wrote:
>> >> > Hello,
>> >> >
>> >> > For your use case, with the new consumer you can still create a new
>> >> > consumer instance for each  topic / partition, and remember the
>> mapping
>> >> of
>> >> > topic / partition => consumer. The upon receiving the http request
>> you
>> >> can
>> >> > then decide which consumer to use. Since the new consumer is single
>> >> > threaded, creating this many new consumers is roughly the same cost
>> >> > with
>> >> > the old simple consumer.
>> >> >
>> >> > Guozhang
>> >> >
>> >> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <kafka-9999...@sblk.de>
>> >> > wrote:
>> >> >
>> >> >>
>> >> >> Hi Jun,
>> >> >>
>> >> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <jun...@gmail.com>
>> wrote:
>> >> >>> The new consumer api will also allow you to do what you want in a
>> >> >>> SimpleConsumer (e.g., subscribe to a static set of partitions,
>> >> >>> control
>> >> >>> initial offsets, etc), only more conveniently.
>> >> >>
>> >> >> Yeah, I have reviewed the available javadocs for the new Kafka 0.9
>> >> >> consumer APIs.
>> >> >> However, while they still allow me to do roughly what I want, I
>> >> >> fear
>> >> that
>> >> >> they will result in an overall much worse performing implementation
>> on
>> >> my
>> >> >> side.
>> >> >> The main problem I have in my scenario is that consumer requests
>> >> >> are
>> >> >> coming in via stateless HTTP requests (each request is standalone
>> and
>> >> >> specifies topics+partitions+offsets to read data from) and I need
>> >> >> to
>> >> find a
>> >> >> good way to do connection pooling to the Kafka backend for good
>> >> >> performance. The SimpleConsumer would allow me to do that, an
>> approach
>> >> with
>> >> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
>> >> >>
>> >> >> Basically, what I am looking for is a way to pool connections per
>> >> >> Kafka
>> >> >> broker host, independent of the topics/partitions/clients/..., so
>> each
>> >> >> Tomcat app server would keep N disjunctive connection pools, if I
>> >> >> have N
>> >> >> Kafka broker hosts.
>> >> >> I would then keep some central metadata which tells me which hosts
>> are
>> >> the
>> >> >> leaders for which topic+partition and for an incoming HTTP client
>> >> request
>> >> >> I'd just take a Kafka connection from the pool for that particular
>> >> broker
>> >> >> host, request the data and return the connection to the pool. This
>> >> >> means
>> >> >> that a Kafka broker host will get requests from lots of different
>> end
>> >> >> consumers via the same TCP connection (sequentially of course).
>> >> >>
>> >> >> With the new Kafka consumer API I would have to
>> subscribe/unsubscribe
>> >> from
>> >> >> topics every time I take a connection from the pool and as the
>> request
>> >> may
>> >> >> need go to a different broker host than the last one, that wouldn't
>> >> >> even
>> >> >> prevent all the connection/reconnection overhead. I guess I could
>> >> >> create
>> >> >> one dedicated connection pool per topic-partition, that way
>> >> >> connection/reconnection overhead should be minimized, but that way
>> I'd
>> >> end
>> >> >> up with hundreds of connection pools per app server, also not a
>> >> >> good
>> >> >> approach.
>> >> >> All in all, the planned design of the new consumer API just doesn't
>> >> >> seem
>> >> >> to fit my use case well. Which is why I am a bit anxious about the
>> >> >> SimpleConsumer API being deprecated.
>> >> >>
>> >> >> Or am I missing something here? Thanks!
>> >> >>
>> >> >> Greetings
>> >> >> Valentin
{code}

> Allow high performance SimpleConsumer use cases to still work with new Kafka 
> 0.9 consumer APIs
> ----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1655
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1655
>             Project: Kafka
>          Issue Type: New Feature
>          Components: consumer
>    Affects Versions: 0.9.0
>            Reporter: Valentin
>            Assignee: Neha Narkhede
>
> Hi guys,
> currently Kafka allows consumers to either chose the low level or the high 
> level API, depending on the specific requirements of the consumer 
> implementation. However, I was told that the current low level API 
> (SimpleConsumer) will be deprecated once the new Kafka 0.9 consumer APIs are 
> available.
> In this case it would be good, if we can ensure that the new API does offer 
> some ways to get similar performance for use cases which perfectly fit the 
> old SimpleConsumer API approach.
> Example Use Case:
> A high throughput HTTP API wrapper for consumer requests which gets HTTP REST 
> calls to retrieve data for a specific set of topic partitions and offsets.
> Here the SimpleConsumer is perfect because it allows connection pooling in 
> the HTTP API web application with one pool per existing kafka broker and the 
> web application can handle the required metadata managment to know which pool 
> to fetch a connection for, for each used topic partition. This means 
> connections to Kafka brokers can remain open/pooled and 
> connection/reconnection and metadata overhead is minimized.
> To achieve something similar with the new Kafka 0.9 consumer APIs, it would 
> be good if it could:
> - provide a lowlevel call to connect to a specific broker and to read data 
> from a topic+partition+offset
> OR
> - ensure that subscribe/unsubscribe calls are very cheap and can run without 
> requiring any network traffic. If I subscribe to a topic partition for which 
> the same broker is the leader as the last topic partition which was in use 
> for this consumer API connection, then the consumer API implementation should 
> recognize this and should not do any disconnects/reconnects and just reuse 
> the existing connection to that kafka broker.
> Or put differently, it should be possible to do external metadata handling in 
> the consumer API client and the client should be able to pool consumer API 
> connections effectively by having one pool per Kafka broker.
> Greetings
> Valentin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1655) Allow high performance SimpleConsumer use cases to still work with new Kafka 0.9 consumer APIs

Reply via email to