FrankChen021 opened a new issue #10259:
URL: https://github.com/apache/druid/issues/10259
### Affected Version
- 0.17
- 0.18
- 0.19
### Description
There's a topic in our kafka cluster, which contains messages in **pretty**
JSON format as below. The newest 0.19 fails to parse these messages as JSON
objects while 0.16 works fine.
JSON example
```
{
"byteCount":0,
"partition":0,
"recordAge":0,
"recordCount":0,
"replicationLatency":0,
"targetCluster":"dst",
"timestamp":1597045440490,
"topic":"test"
}
```
0.16


0.19


after changing `Input Format` from default `Regex` to `Json`, following
error appears.

# Reason
After diving into the code between 0.16 and 0.19, I found the problem is
caused by `JsonReader` which was introduced in 0.17 by #8823
The new `JsonReader` inherits from `TextReader` which uses `LineIterator` to
split the input string and return text LINE BY LINE instead of the whole text.
So for multiple-line json text, this implementation fails to parse the text
as JSON object.
https://github.com/apache/druid/blob/e2487bcc30c5ac0f4281ddd2dcf8906dcd00cba8/core/src/main/java/org/apache/druid/data/input/TextReader.java#L57
# How to fix
Maybe `JsonReader` should override the `intermediateRowIterator` function
defined in `TextReader` to return an iterator with only one string object.
@jihoonson please check this bug if you're convenient :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]