ddcprg opened a new issue #11600:
URL: https://github.com/apache/druid/issues/11600
### Description
Given the the Avro schema:
```
{
"type": "record",
"name": "location",
"fields": [
{
"name": "hilltop",
"type": {
"type": "record",
"name": "anotherNameDescribingType",
"fields": [
{
"name": "timestamp",
"type": "string",
"doc": "Local time",
"default": ""
},
{
"name": "view",
"type": "string",
"doc": "doYouSeeWhatISee",
"default": ""
}
]
}
}
]
}
```
And the following sequence of Kafka records:
```
{ "hilltop": { "timestamp": "2021-08-17T08:15:51.000",
"view": "cloudy" }}
{ "hilltop": { "timestamp": "2021-08-17T16:27:50.000",
"view": "amazing" }}
rubbish
{ "hilltop": { "timestamp": "2021-08-17T18:03:52.000",
"view": "sunset" }}
```
And the datasource tuning config set to:
```
"tuningConfig": {
"type": "kafka",
"reportParseExceptions": false,
"logParseExceptions": true
}
```
When the third record is processed the supervisor stops ingesting records
and all its tasks will fail with:
```
org.apache.druid.java.util.common.RE: Failed to get Avro schema: ...
```
### Motivation
I would expect the ingestion task to ignore the third record which is not an
Avro record, log the error out and continue ingesting. However, the decoder
takes the first bytes of the message, convert them to int and tries to load a
schema with that value which in turn doesn't exist in the schema registry
because the record is not an Avro record, then the `RE` is thrown. The question
is whether the decoder should raise a `ParserException` instead and keep
ingesting the topic.
The current behaviour makes the ingestion tasks fail forever and the
supervisor won't make further progress.
Arguably, a missing schema should be considered a parsing error since there
is no way to decode the message bytes correctly.
If you agree with changing this behaviour I'll be happy to raise a PR with
the change. If not please explain the rationale behind the current behaviour
and how to deal with this scenario.
To keep the code compatible with the current behaviour, a new tuning
property could be added, let's say:
```
boolean treatMissingSchemaAsParserException
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]