[ 
https://issues.apache.org/jira/browse/GOBBLIN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Krishna updated GOBBLIN-1676:
-------------------------------------
    Description: 
Mode : Gobblin with map-reduce mode, and extract data from a Kafka source.
After upgrading to Gobblin {{0.16}} and Kakfka {{1.1}} Client, this issue is 
observed:

TLDR: Some mappers exit prematurely, reading 0 records, even though there are 
records to be read.Debugging details:

I debugged the Kafka extractor  & client code and figured out a couple of 
things: 

{{   1. Kafka1Client}} (and also {{{}Kafka9client{}}}) uses the [Poll() 
method|https://github.com/apache/gobblin/blob/b400089035fe7ada1a523f9b7e5321e11d46d651/gobblin-modules/gobblin-kafka-1/src/main/java/org/apache/gobblin/kafka/client/Kafka1ConsumerClient.java#L164]
 to consume records from Kafka topic/partition. But the {{Poll()}} does not 
guarantee that it will always return the records (at least according to [this 
Stackoverflow 
post|https://stackoverflow.com/questions/54988037/kafks-consumer-poll-returns-no-data]
 ).

       2. Kafka extractor at the[ line linked 
here|https://github.com/apache/gobblin/blob/9ce1e65aa36674742877b5aa2083412d85b5764f/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java#L161]
 immediately exits when the iterator returned from {{Poll()}} in step 1 is 
empty, without even checking if there are more records to be read.

In steps 1 and 2, there are no retries for {{{}Poll(){}}}, or if the poll 
returns no records, there is no special handling to try polling again.

Possible Fix:
I could fix this issue temporarily, by adding a retry logic around {{Poll()}} 
in Step 1. I added a retry of 3 times, and the mappers are always getting 
records in the retries.
I also see some retry logic was present in {{Kafka08Client}} for polling 
records 
[here|https://github.com/apache/gobblin/blob/ad32bba684f4c004801b6983bc52c126df2f04b2/gobblin-modules/gobblin-kafka-08/src/main/java/org/apache/gobblin/kafka/client/Kafka08ConsumerClient.java#L248].
 

  was:
Mode : Gobblin with map-reduce mode, and extract data from a Kafka source.
After upgrading to Gobblin {{0.16}} and Kakfka {{1.1}} Client, this issue is 
observed:

TLDR: Some mappers exit prematurely, reading 0 records, even though there are 
records to be read.Debugging details:

I debugged the Kafka extractor  & client code and figured out a couple of 
things: # {{Kafka1Client}} (and also {{{}Kafka9client{}}}) uses the [Poll() 
method|https://github.com/apache/gobblin/blob/b400089035fe7ada1a523f9b7e5321e11d46d651/gobblin-modules/gobblin-kafka-1/src/main/java/org/apache/gobblin/kafka/client/Kafka1ConsumerClient.java#L164]
 to consume records from Kafka topic/partition. But the {{Poll()}} does not 
guarantee that it will always return the records (at least according to [this 
Stackoverflow 
post|https://stackoverflow.com/questions/54988037/kafks-consumer-poll-returns-no-data]
 ).
 # Kafka extractor at the[ line linked 
here|https://github.com/apache/gobblin/blob/9ce1e65aa36674742877b5aa2083412d85b5764f/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java#L161]
 immediately exits when the iterator returned from {{Poll()}} in step 1 is 
empty, without even checking if there are more records to be read.

In steps 1 and 2, there are no retries for {{{}Poll(){}}}, or if the poll 
returns no records, there is no special handling to try polling again.

Possible Fix:
I could fix this issue temporarily, by adding a retry logic around {{Poll()}} 
in Step 1. I added a retry of 3 times, and the mappers are always getting 
records in the retries.
I also see some retry logic was present in {{Kafka08Client}} for polling 
records 
[here|https://github.com/apache/gobblin/blob/ad32bba684f4c004801b6983bc52c126df2f04b2/gobblin-modules/gobblin-kafka-08/src/main/java/org/apache/gobblin/kafka/client/Kafka08ConsumerClient.java#L248].
 


> Kafka consumer returns 0 records even though there are records to be consumed
> -----------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1676
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1676
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-kafka
>            Reporter: Bharath Krishna
>            Assignee: Shirshanka Das
>            Priority: Major
>
> Mode : Gobblin with map-reduce mode, and extract data from a Kafka source.
> After upgrading to Gobblin {{0.16}} and Kakfka {{1.1}} Client, this issue is 
> observed:
> TLDR: Some mappers exit prematurely, reading 0 records, even though there are 
> records to be read.Debugging details:
> I debugged the Kafka extractor  & client code and figured out a couple of 
> things: 
> {{   1. Kafka1Client}} (and also {{{}Kafka9client{}}}) uses the [Poll() 
> method|https://github.com/apache/gobblin/blob/b400089035fe7ada1a523f9b7e5321e11d46d651/gobblin-modules/gobblin-kafka-1/src/main/java/org/apache/gobblin/kafka/client/Kafka1ConsumerClient.java#L164]
>  to consume records from Kafka topic/partition. But the {{Poll()}} does not 
> guarantee that it will always return the records (at least according to [this 
> Stackoverflow 
> post|https://stackoverflow.com/questions/54988037/kafks-consumer-poll-returns-no-data]
>  ).
>        2. Kafka extractor at the[ line linked 
> here|https://github.com/apache/gobblin/blob/9ce1e65aa36674742877b5aa2083412d85b5764f/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java#L161]
>  immediately exits when the iterator returned from {{Poll()}} in step 1 is 
> empty, without even checking if there are more records to be read.
> In steps 1 and 2, there are no retries for {{{}Poll(){}}}, or if the poll 
> returns no records, there is no special handling to try polling again.
> Possible Fix:
> I could fix this issue temporarily, by adding a retry logic around {{Poll()}} 
> in Step 1. I added a retry of 3 times, and the mappers are always getting 
> records in the retries.
> I also see some retry logic was present in {{Kafka08Client}} for polling 
> records 
> [here|https://github.com/apache/gobblin/blob/ad32bba684f4c004801b6983bc52c126df2f04b2/gobblin-modules/gobblin-kafka-08/src/main/java/org/apache/gobblin/kafka/client/Kafka08ConsumerClient.java#L248].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to