wyndhblb opened a new issue #10930:
URL: https://github.com/apache/druid/issues/10930
### Affected Version
0.20.1
### Description
We have an interesting issue that we've tracked to the transactional nature
of production. The real error that seems occurs (which halts the entire ingest
forever, and there's no way to skip it which is unfortunate in it's own)
the back trace is
```
2021-02-23T22:50:44,663 ERROR [task-runner-0-priority-0]
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner -
Encountered exception while running task.
org.apache.kafka.common.KafkaException: Received exception when fetching the
next record from test_infra_infrabus_event-50. If needed, please seek past the
record to continue consumption.
at
org.apache.kafka.clients.consumer.internals.Fetcher$CompletedFetch.fetchRecords(Fetcher.java:1577)
~[?:?]
at
org.apache.kafka.clients.consumer.internals.Fetcher$CompletedFetch.access$1700(Fetcher.java:1432)
~[?:?]
at
org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:684)
~[?:?]
at
org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:635)
~[?:?]
at
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1283)
~[?:?]
at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1237)
~[?:?]
at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)
~[?:?]
at
org.apache.druid.indexing.kafka.KafkaRecordSupplier.poll(KafkaRecordSupplier.java:124)
~[?:?]
at
org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner.getRecords(IncrementalPublishingKafkaIndexTaskRunner.java:98)
~[?:?]
at
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:603)
~[druid-indexing-service-0.20.1.jar:0.20.1]
at
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:267)
[druid-indexing-service-0.20.1.jar:0.20.1]
at
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:145)
[druid-indexing-service-0.20.1.jar:0.20.1]
at
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:451)
[druid-indexing-service-0.20.1.jar:0.20.1]
at
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:423)
[druid-indexing-service-0.20.1.jar:0.20.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_275]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_275]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_275]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
Caused by: org.apache.kafka.common.InvalidRecordException: Incorrect
declared batch size, records still remaining in file
2021-02-23T22:50:44,672 INFO [task-runner-0-priority-0]
org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed
with status: {
"id" : "xxxx_316d619204b9ebf_bljepncn",
"status" : "FAILED",
"duration" : 236717,
"errorMsg" : "org.apache.kafka.common.KafkaException: Received exception
when fetching the next record from test_i...",
"location" : {
"host" : null,
"port" : -1,
"tlsPort" : -1
}
}```
The hunch is that this may be an aborted production transaction upstream.
Hard to say the exact cause (could also be some Compression issues causing some
truncation of something). In any case in an effort to see if it was a
transaction sort of thing, ` isolation.level = read_uncommitted` was set
however, it is not obeyed in the downstream peons.
... a log from the peon
```
the druid spec log shows
"consumerProperties" : {
"bootstrap.servers" : "test-infra-kf.di.infra-prod.asapp.com:9092",
"group.id" : "test-infrabus-druid-minute-1",
"auto.offset.reset" : "earliest",
"max.partition.fetch.bytes" : 10485760,
"fetch.max.bytes" : 52428800,
"isolation.level" : "read_uncommitted"
},
.. the log of the kafka-properties of the peon shows
allow.auto.create.topics = true
client.id = consumer-kafka-supervisor-oaiankjn-1
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = kafka-supervisor-oaiankjn
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_committed
...
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]