jon-wei opened a new pull request, #13089:
URL: https://github.com/apache/druid/pull/13089

   When JsonInputFormat is used for streaming ingestion, if the input string 
contains multiple JSON events, but there is a parse exception occurs, all 
events in the record will be discarded, even if some events were valid (see 
https://github.com/apache/druid/pull/10383)
   
   This PR adds the following:
   - A `assumedNewlineDelimited` option for JsonInputFormat that can be used 
when the input is known to be newline-delimited JSON. This will force the input 
format to create a `JsonLineReader`, which can parse lines independently, so 
that a parse exception on one line will not prevent other valid lines from 
being ingested.
   - A `useJsonNodeReader` option for JsonInputFormat. If true, instead of 
creating a `JsonReader`, the input format will create a new `JsonNodeReader` 
for parsing multi-line JSON. This new parser splits an input string into 
`JsonNode` objects and parses them one by one into `InputRow`. This allows 
valid events found prior to a parse exception to be ingested. Potentially valid 
events after invalid JSON syntax is encountered will still be ignored. This 
also prevents valid JSON events with an unparseable timestamp from causing 
other events in the same input string to be discarded.
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to