[
https://issues.apache.org/jira/browse/GOBBLIN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bharath Krishna updated GOBBLIN-1676:
-------------------------------------
Description:
Mode : Gobblin with map-reduce mode, and extract data from a Kafka source.
After upgrading to Gobblin {{0.16}} and Kakfka {{1.1}} Client, this issue is
observed:
TLDR: Some mappers exit prematurely, reading 0 records, even though there are
records to be read.Debugging details:
I debugged the Kafka extractor & client code and figured out a couple of
things:
{{ 1. Kafka1Client}} (and also {{{}Kafka9client{}}}) uses the [Poll()
method|https://github.com/apache/gobblin/blob/b400089035fe7ada1a523f9b7e5321e11d46d651/gobblin-modules/gobblin-kafka-1/src/main/java/org/apache/gobblin/kafka/client/Kafka1ConsumerClient.java#L164]
to consume records from Kafka topic/partition. But the {{Poll()}} does not
guarantee that it will always return the records (at least according to [this
Stackoverflow
post|https://stackoverflow.com/questions/54988037/kafks-consumer-poll-returns-no-data]
).
2. Kafka extractor at the[ line linked
here|https://github.com/apache/gobblin/blob/9ce1e65aa36674742877b5aa2083412d85b5764f/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java#L161]
immediately exits when the iterator returned from {{Poll()}} in step 1 is
empty, without even checking if there are more records to be read.
In steps 1 and 2, there are no retries for {{{}Poll(){}}}, or if the poll
returns no records, there is no special handling to try polling again.
Possible Fix:
I could fix this issue temporarily, by adding a retry logic around {{Poll()}}
in Step 1. I added a retry of 3 times, and the mappers are always getting
records in the retries.
I also see some retry logic was present in {{Kafka08Client}} for polling
records
[here|https://github.com/apache/gobblin/blob/ad32bba684f4c004801b6983bc52c126df2f04b2/gobblin-modules/gobblin-kafka-08/src/main/java/org/apache/gobblin/kafka/client/Kafka08ConsumerClient.java#L248].
was:
Mode : Gobblin with map-reduce mode, and extract data from a Kafka source.
After upgrading to Gobblin {{0.16}} and Kakfka {{1.1}} Client, this issue is
observed:
TLDR: Some mappers exit prematurely, reading 0 records, even though there are
records to be read.Debugging details:
I debugged the Kafka extractor & client code and figured out a couple of
things: # {{Kafka1Client}} (and also {{{}Kafka9client{}}}) uses the [Poll()
method|https://github.com/apache/gobblin/blob/b400089035fe7ada1a523f9b7e5321e11d46d651/gobblin-modules/gobblin-kafka-1/src/main/java/org/apache/gobblin/kafka/client/Kafka1ConsumerClient.java#L164]
to consume records from Kafka topic/partition. But the {{Poll()}} does not
guarantee that it will always return the records (at least according to [this
Stackoverflow
post|https://stackoverflow.com/questions/54988037/kafks-consumer-poll-returns-no-data]
).
# Kafka extractor at the[ line linked
here|https://github.com/apache/gobblin/blob/9ce1e65aa36674742877b5aa2083412d85b5764f/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java#L161]
immediately exits when the iterator returned from {{Poll()}} in step 1 is
empty, without even checking if there are more records to be read.
In steps 1 and 2, there are no retries for {{{}Poll(){}}}, or if the poll
returns no records, there is no special handling to try polling again.
Possible Fix:
I could fix this issue temporarily, by adding a retry logic around {{Poll()}}
in Step 1. I added a retry of 3 times, and the mappers are always getting
records in the retries.
I also see some retry logic was present in {{Kafka08Client}} for polling
records
[here|https://github.com/apache/gobblin/blob/ad32bba684f4c004801b6983bc52c126df2f04b2/gobblin-modules/gobblin-kafka-08/src/main/java/org/apache/gobblin/kafka/client/Kafka08ConsumerClient.java#L248].
> Kafka consumer returns 0 records even though there are records to be consumed
> -----------------------------------------------------------------------------
>
> Key: GOBBLIN-1676
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1676
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-kafka
> Reporter: Bharath Krishna
> Assignee: Shirshanka Das
> Priority: Major
>
> Mode : Gobblin with map-reduce mode, and extract data from a Kafka source.
> After upgrading to Gobblin {{0.16}} and Kakfka {{1.1}} Client, this issue is
> observed:
> TLDR: Some mappers exit prematurely, reading 0 records, even though there are
> records to be read.Debugging details:
> I debugged the Kafka extractor & client code and figured out a couple of
> things:
> {{ 1. Kafka1Client}} (and also {{{}Kafka9client{}}}) uses the [Poll()
> method|https://github.com/apache/gobblin/blob/b400089035fe7ada1a523f9b7e5321e11d46d651/gobblin-modules/gobblin-kafka-1/src/main/java/org/apache/gobblin/kafka/client/Kafka1ConsumerClient.java#L164]
> to consume records from Kafka topic/partition. But the {{Poll()}} does not
> guarantee that it will always return the records (at least according to [this
> Stackoverflow
> post|https://stackoverflow.com/questions/54988037/kafks-consumer-poll-returns-no-data]
> ).
> 2. Kafka extractor at the[ line linked
> here|https://github.com/apache/gobblin/blob/9ce1e65aa36674742877b5aa2083412d85b5764f/gobblin-modules/gobblin-kafka-common/src/main/java/org/apache/gobblin/source/extractor/extract/kafka/KafkaExtractor.java#L161]
> immediately exits when the iterator returned from {{Poll()}} in step 1 is
> empty, without even checking if there are more records to be read.
> In steps 1 and 2, there are no retries for {{{}Poll(){}}}, or if the poll
> returns no records, there is no special handling to try polling again.
> Possible Fix:
> I could fix this issue temporarily, by adding a retry logic around {{Poll()}}
> in Step 1. I added a retry of 3 times, and the mappers are always getting
> records in the retries.
> I also see some retry logic was present in {{Kafka08Client}} for polling
> records
> [here|https://github.com/apache/gobblin/blob/ad32bba684f4c004801b6983bc52c126df2f04b2/gobblin-modules/gobblin-kafka-08/src/main/java/org/apache/gobblin/kafka/client/Kafka08ConsumerClient.java#L248].
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)