[ 
https://issues.apache.org/jira/browse/GOBBLIN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576035#comment-17576035
 ] 

Bharath Krishna commented on GOBBLIN-1676:
------------------------------------------

Based on the discussion in slack channel, I'll add configurable retries when 
Kafka1 client returns no records while polling.

> Kafka consumer returns 0 records even though there are records to be consumed
> -----------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1676
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1676
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-kafka
>            Reporter: Bharath Krishna
>            Assignee: Shirshanka Das
>            Priority: Major
>
> Mode : Gobblin with map-reduce mode, and extract data from a Kafka source.
> After upgrading to Gobblin {{0.16}} and Kakfka {{1.1}} Client, this issue is 
> observed:
> TLDR: Some mappers exit prematurely, reading 0 records, even though there are 
> records to be read.Debugging details:
> I debugged the Kafka extractor  & client code and figured out a couple of 
> things: 
> {{   1. Kafka1Client}} (and also {{{}Kafka9client{}}}) uses the [Poll() 
> method|https://github.com/apache/gobblin/blob/b400089035fe7ada1a523f9b7e5321e11d46d651/gobblin-modules/gobblin-kafka-1/src/main/java/org/apache/gobblin/kafka/client/Kafka1ConsumerClient.java#L164]
>  to consume records from Kafka topic/partition. But the {{Poll()}} does not 
> guarantee that it will always return the records (at least according to [this 
> Stackoverflow 
> post|https://stackoverflow.com/questions/54988037/kafks-consumer-poll-returns-no-data]
>  ).
>        2. Kafka extractor at the[ line linked 
> here|https://github.com/apache/gobblin/blob/9ce1e65aa36674742877b5aa2083412d85b5764f/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java#L161]
>  immediately exits when the iterator returned from {{Poll()}} in step 1 is 
> empty, without even checking if there are more records to be read.
> In steps 1 and 2, there are no retries for {{{}Poll(){}}}, or if the poll 
> returns no records, there is no special handling to try polling again.
> Possible Fix:
> I could fix this issue temporarily, by adding a retry logic around {{Poll()}} 
> in Step 1. I added a retry of 3 times, and the mappers are always getting 
> records in the retries.
> I also see some retry logic was present in {{Kafka08Client}} for polling 
> records 
> [here|https://github.com/apache/gobblin/blob/ad32bba684f4c004801b6983bc52c126df2f04b2/gobblin-modules/gobblin-kafka-08/src/main/java/org/apache/gobblin/kafka/client/Kafka08ConsumerClient.java#L248].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to