paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader URL: https://github.com/apache/drill/pull/2045 # [DRILL-7683](https://issues.apache.org/jira/browse/DRILL-7683): Add "message parsing" to new JSON loader ## Description Worked on a project that uses the new JSON loader to parse a REST response that includes a set of "wrapper" fields around the JSON payload. Example: ``` { "status": "ok", "results: [ data here ]} ``` To solve this cleanly, added the ability to specify a "message parser" to consume JSON tokens up to the start of the data. This parser can be written as needed for each different data source. When working on the REST data source, it became clear we need a no-code way to handle the same issue. So, extended the message parser to handle the simple case, a path to the data. In the above, the path would be just `results`. The path can contain any number of slash-separated elements: `response/body/rows` for example. Since this change adds two more parameters to the JSON structure parser, added builders to gather the needed parameters rather than making the constructor even larger. Note that, aside from the private plugin mentioned above, no other code uses the JSON loader yet. ## Developer Documentation This PR is part of the "new" V2 EVF-based JSON parser. An example of usage appears in PR #1892 (REST storage plugin.) To use the simple path-based form of message parsing, add the following option to the JSON parser builder: ``` .dataPath("path/to/data") ``` The tail element should be the one that holds an array of JSON records. To add custom message parsing (to check return status, say), use a different option of the builder: ``` .messageParser(parser) ``` Then implement the `MessageParser` class to do the parsing. The present version works at the level of JSON tokens: you must use the provided "tokenizer" to read each token and do the right thing. Since working at the token level is tedious, the goal is to provide a read-made parser that takes a path to the data, such as "response.data" and skips all fields except those in the path. The goal here is to get the mechanism added to the JSON parser so we can then try it in the REST plugin and work out exactly what we need in that higher-level parser level. ## User Documentation N/A ## Testing Added unit tests. Reran all existing tests.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
