jon-wei opened a new pull request, #13089: URL: https://github.com/apache/druid/pull/13089
When JsonInputFormat is used for streaming ingestion, if the input string contains multiple JSON events, but there is a parse exception occurs, all events in the record will be discarded, even if some events were valid (see https://github.com/apache/druid/pull/10383) This PR adds the following: - A `assumedNewlineDelimited` option for JsonInputFormat that can be used when the input is known to be newline-delimited JSON. This will force the input format to create a `JsonLineReader`, which can parse lines independently, so that a parse exception on one line will not prevent other valid lines from being ingested. - A `useJsonNodeReader` option for JsonInputFormat. If true, instead of creating a `JsonReader`, the input format will create a new `JsonNodeReader` for parsing multi-line JSON. This new parser splits an input string into `JsonNode` objects and parses them one by one into `InputRow`. This allows valid events found prior to a parse exception to be ingested. Potentially valid events after invalid JSON syntax is encountered will still be ignored. This also prevents valid JSON events with an unparseable timestamp from causing other events in the same input string to be discarded. This PR has: - [x] been self-reviewed. - [ ] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [x] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
