FrankChen021 opened a new issue #10259:
URL: https://github.com/apache/druid/issues/10259


   ### Affected Version
   
   - 0.17
   - 0.18
   - 0.19
   
   ### Description
   
   There's a topic in our kafka cluster, which contains messages in **pretty** 
JSON format as below. The newest 0.19 fails to parse these messages as JSON 
objects while 0.16 works fine.
   
   JSON example
   
   ```
   {
           "byteCount":0,
           "partition":0,
           "recordAge":0,
           "recordCount":0,
           "replicationLatency":0,
           "targetCluster":"dst",
           "timestamp":1597045440490,
           "topic":"test"
   }
   ```
   
   0.16
   
   
![image](https://user-images.githubusercontent.com/6525742/89770984-28592500-db32-11ea-8389-88d53458f909.png)
   
![image](https://user-images.githubusercontent.com/6525742/89771027-3eff7c00-db32-11ea-97d1-a5484178120a.png)
   
   
   0.19
   
![image](https://user-images.githubusercontent.com/6525742/89771163-7b32dc80-db32-11ea-887f-d235c1da76fb.png)
   
![image](https://user-images.githubusercontent.com/6525742/89771277-a4536d00-db32-11ea-9277-1bf6e91c18c7.png)
   
   after changing `Input Format` from default `Regex` to `Json`, following 
error appears.
   
   
![image](https://user-images.githubusercontent.com/6525742/89771008-3313ba00-db32-11ea-9339-f70cc98db878.png)
   
   
   # Reason
   
   After diving into the code between 0.16 and 0.19, I found the problem is 
caused by `JsonReader` which was introduced in 0.17 by #8823
   
   The new `JsonReader` inherits from `TextReader` which uses `LineIterator` to 
split the input string and return text LINE BY LINE instead of the whole text.
   
   So for multiple-line json text, this implementation fails to parse the text 
as JSON object.
   
   
https://github.com/apache/druid/blob/e2487bcc30c5ac0f4281ddd2dcf8906dcd00cba8/core/src/main/java/org/apache/druid/data/input/TextReader.java#L57
   
   # How to fix
   
   Maybe `JsonReader` should override the `intermediateRowIterator` function 
defined in `TextReader` to return an iterator with only one string object.
   
   
   @jihoonson please check this bug if you're convenient :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to