[ https://issues.apache.org/jira/browse/KAFKA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151249#comment-14151249 ]
Valentin commented on KAFKA-1655: --------------------------------- mailing list thread which led to the creation of this ticket: {code} On Sat, 27 Sep 2014 08:31:01 -0700, Jun Rao <jun...@gmail.com> wrote: > Valentin, > > That's a good point. We don't have this use case in mind when designing the > new consumer api. A straightforward implementation could be removing the > locally cached topic metadata for unsubscribed topics. It's probably > possible to add a config value to avoid churns in caching the metadata. > Could you file a jira so that we can track this? > > Thanks, > > Jun > > On Thu, Sep 25, 2014 at 4:19 AM, Valentin <kafka-9999...@sblk.de> wrote: > >> >> Hi Jun, Hi Guozhang, >> >> hm, yeah, if the subscribe/unsubscribe is a smart and lightweight >> operation this might work. But if it needs to do any additional calls to >> fetch metadata during a subscribe/unsubscribe call, the overhead could >> get >> quite problematic. The main issue I still see here is that an additional >> layer is added which does not really provide any benefit for a use case >> like mine. >> I.e. the leader discovery and connection handling you mention below don't >> really offer value in this case, as for the connection pooling approach >> suggested, I will have to discover and maintain leader metadata in my own >> code anyway as well as handling connection pooling. So if I understand >> the >> current plans for the Kafka 0.9 consumer correctly, it just doesn't work >> well for my use case. Sure, there are workarounds to make it work in my >> scenario, but I doubt any of them would scale as well as my current >> SimpleConsumer approach :| >> Or am I missing something here? >> >> Greetings >> Valentin >> >> On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <jun...@gmail.com> wrote: >> > Valentin, >> > >> > As Guozhang mentioned, to use the new consumer in the SimpleConsumer >> way, >> > you would subscribe to a set of topic partitions and the issue poll(). >> You >> > can change subscriptions on every poll since it's cheap. The benefit >> > you >> > get is that it does things like leader discovery and maintaining >> > connections to the leader automatically for you. >> > >> > In any case, we will leave the old consumer including the >> > SimpleConsumer >> > for sometime even after the new consumer is out. >> > >> > Thanks, >> > >> > Jun >> > >> > On Tue, Sep 23, 2014 at 12:23 PM, Valentin <kafka-9999...@sblk.de> >> wrote: >> > >> >> Hi Jun, >> >> >> >> yes, that would theoretically be possible, but it does not scale at >> all. >> >> >> >> I.e. in the current HTTP REST API use case, I have 5 connection pools >> on >> >> every tomcat server (as I have 5 brokers) and each connection pool >> holds >> >> upto 10 SimpleConsumer connections. So all in all I get a maximum of >> >> 50 >> >> open connections per web application server. And with that I am able >> >> to >> >> handle most requests from HTTP consumers without having to open/close >> >> any new connections to a broker host. >> >> >> >> If I would now do the same implementation with the new Kafka 0.9 high >> >> level consumer, I would end up with >1000 connection pools (as I have >> >> >1000 topic partitions) and each of these connection pools would >> contain >> >> a number of consumer connections. So all in all, I would end up with >> >> thousands of connection objects per application server. Not really a >> >> viable approach :| >> >> >> >> Currently I am wondering what the rationale is for deprecating the >> >> SimpleConsumer API, if there are use cases which just work much better >> >> using it. >> >> >> >> Greetings >> >> Valentin >> >> >> >> On 23/09/14 18:16, Guozhang Wang wrote: >> >> > Hello, >> >> > >> >> > For your use case, with the new consumer you can still create a new >> >> > consumer instance for each topic / partition, and remember the >> mapping >> >> of >> >> > topic / partition => consumer. The upon receiving the http request >> you >> >> can >> >> > then decide which consumer to use. Since the new consumer is single >> >> > threaded, creating this many new consumers is roughly the same cost >> >> > with >> >> > the old simple consumer. >> >> > >> >> > Guozhang >> >> > >> >> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <kafka-9999...@sblk.de> >> >> > wrote: >> >> > >> >> >> >> >> >> Hi Jun, >> >> >> >> >> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <jun...@gmail.com> >> wrote: >> >> >>> The new consumer api will also allow you to do what you want in a >> >> >>> SimpleConsumer (e.g., subscribe to a static set of partitions, >> >> >>> control >> >> >>> initial offsets, etc), only more conveniently. >> >> >> >> >> >> Yeah, I have reviewed the available javadocs for the new Kafka 0.9 >> >> >> consumer APIs. >> >> >> However, while they still allow me to do roughly what I want, I >> >> >> fear >> >> that >> >> >> they will result in an overall much worse performing implementation >> on >> >> my >> >> >> side. >> >> >> The main problem I have in my scenario is that consumer requests >> >> >> are >> >> >> coming in via stateless HTTP requests (each request is standalone >> and >> >> >> specifies topics+partitions+offsets to read data from) and I need >> >> >> to >> >> find a >> >> >> good way to do connection pooling to the Kafka backend for good >> >> >> performance. The SimpleConsumer would allow me to do that, an >> approach >> >> with >> >> >> the new Kafka 0.9 consumer API seems to have a lot more overhead. >> >> >> >> >> >> Basically, what I am looking for is a way to pool connections per >> >> >> Kafka >> >> >> broker host, independent of the topics/partitions/clients/..., so >> each >> >> >> Tomcat app server would keep N disjunctive connection pools, if I >> >> >> have N >> >> >> Kafka broker hosts. >> >> >> I would then keep some central metadata which tells me which hosts >> are >> >> the >> >> >> leaders for which topic+partition and for an incoming HTTP client >> >> request >> >> >> I'd just take a Kafka connection from the pool for that particular >> >> broker >> >> >> host, request the data and return the connection to the pool. This >> >> >> means >> >> >> that a Kafka broker host will get requests from lots of different >> end >> >> >> consumers via the same TCP connection (sequentially of course). >> >> >> >> >> >> With the new Kafka consumer API I would have to >> subscribe/unsubscribe >> >> from >> >> >> topics every time I take a connection from the pool and as the >> request >> >> may >> >> >> need go to a different broker host than the last one, that wouldn't >> >> >> even >> >> >> prevent all the connection/reconnection overhead. I guess I could >> >> >> create >> >> >> one dedicated connection pool per topic-partition, that way >> >> >> connection/reconnection overhead should be minimized, but that way >> I'd >> >> end >> >> >> up with hundreds of connection pools per app server, also not a >> >> >> good >> >> >> approach. >> >> >> All in all, the planned design of the new consumer API just doesn't >> >> >> seem >> >> >> to fit my use case well. Which is why I am a bit anxious about the >> >> >> SimpleConsumer API being deprecated. >> >> >> >> >> >> Or am I missing something here? Thanks! >> >> >> >> >> >> Greetings >> >> >> Valentin {code} > Allow high performance SimpleConsumer use cases to still work with new Kafka > 0.9 consumer APIs > ---------------------------------------------------------------------------------------------- > > Key: KAFKA-1655 > URL: https://issues.apache.org/jira/browse/KAFKA-1655 > Project: Kafka > Issue Type: New Feature > Components: consumer > Affects Versions: 0.9.0 > Reporter: Valentin > Assignee: Neha Narkhede > > Hi guys, > currently Kafka allows consumers to either chose the low level or the high > level API, depending on the specific requirements of the consumer > implementation. However, I was told that the current low level API > (SimpleConsumer) will be deprecated once the new Kafka 0.9 consumer APIs are > available. > In this case it would be good, if we can ensure that the new API does offer > some ways to get similar performance for use cases which perfectly fit the > old SimpleConsumer API approach. > Example Use Case: > A high throughput HTTP API wrapper for consumer requests which gets HTTP REST > calls to retrieve data for a specific set of topic partitions and offsets. > Here the SimpleConsumer is perfect because it allows connection pooling in > the HTTP API web application with one pool per existing kafka broker and the > web application can handle the required metadata managment to know which pool > to fetch a connection for, for each used topic partition. This means > connections to Kafka brokers can remain open/pooled and > connection/reconnection and metadata overhead is minimized. > To achieve something similar with the new Kafka 0.9 consumer APIs, it would > be good if it could: > - provide a lowlevel call to connect to a specific broker and to read data > from a topic+partition+offset > OR > - ensure that subscribe/unsubscribe calls are very cheap and can run without > requiring any network traffic. If I subscribe to a topic partition for which > the same broker is the leader as the last topic partition which was in use > for this consumer API connection, then the consumer API implementation should > recognize this and should not do any disconnects/reconnects and just reuse > the existing connection to that kafka broker. > Or put differently, it should be possible to do external metadata handling in > the consumer API client and the client should be able to pool consumer API > connections effectively by having one pool per Kafka broker. > Greetings > Valentin -- This message was sent by Atlassian JIRA (v6.3.4#6332)