[ 
https://issues.apache.org/jira/browse/SPARK-28367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088514#comment-17088514
 ] 

Gabor Somogyi edited comment on SPARK-28367 at 4/22/20, 1:35 PM:
-----------------------------------------------------------------

I've taken a look at the possibilities given by the new API in 
[KIP-396|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484].
 I've found the following problems:
* Consumer properties don't match 100% with AdminClient so Consumer properties 
can't be used for instantiation (at the first glance I think adding this to the 
Spark API would be an overkill)
* With the new API by using AdminClient Spark looses the possibility to use the 
assign, subscribe and subscribePattern APIs (implementing this logic would be 
feasible since Kafka consumer does this on client side as well but would be 
ugly).

My main conclusion is that adding AdminClient and using Consumer in a parallel 
way would be super hacky. I would use either Consumer (which doesn't provide 
metadata only at the moment) or AdminClient (where it must be checked whether 
all existing features can be filled + how to add properties).



was (Author: gsomogyi):
I've taken a look at the possibilities given by the new API in 
[KIP-396|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484].
 I've found the following problems:
* Consumer properties don't match 100% with AdminClient so Consumer properties 
can't be used for instantiation (at the first glance I think adding this to the 
API would be an overkill)
* With the new API by using AdminClient Spark looses the possibility to use the 
assign, subscribe and subscribePattern APIs (implementing this logic would be 
feasible since Kafka consumer does this on client side as well but would be 
ugly).

My main conclusion is that adding AdminClient and using Consumer in a parallel 
way would be super hacky. I would use either Consumer (which doesn't provide 
metadata only at the moment) or AdminClient (where it must be checked whether 
all existing features can be filled + how to add properties).


> Kafka connector infinite wait because metadata never updated
> ------------------------------------------------------------
>
>                 Key: SPARK-28367
>                 URL: https://issues.apache.org/jira/browse/SPARK-28367
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.1.3, 2.2.3, 2.3.3, 2.4.3, 3.0.0
>            Reporter: Gabor Somogyi
>            Priority: Critical
>
> Spark uses an old and deprecated API named poll(long) which never returns and 
> stays in live lock if metadata is not updated (for instance when broker 
> disappears at consumer creation).
> I've created a small standalone application to test it and the alternatives: 
> https://github.com/gaborgsomogyi/kafka-get-assignment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to