[GitHub] [drill] cgivre commented on a diff in pull request #2526: DRILL-8204: Allow Provided Schema for HTTP Plugin in JSON Mode

GitBox Mon, 02 May 2022 17:30:57 -0700


cgivre commented on code in PR #2526:
URL: https://github.com/apache/drill/pull/2526#discussion_r863277815



##########
contrib/storage-http/JSON_Options.md:
##########
@@ -0,0 +1,125 @@
+# JSON Options and Configuration 
+
+Drill has a collection of JSON configuration options to allow you to configure 
how Drill interprets JSON files.  These are set at the global level, however 
the HTTP plugin
+allows you to configure these options individually per connection and override 
the Drill defaults.  The options are:
+
+* `allowNanInf`:  Configures the connection to interpret `NaN` and `Inf` values
+* `allTextMode`:  By default, Drill attempts to infer data types from JSON 
data. If the data is malformed, Drill may throw schema change exceptions. If 
your data is
+  inconsistent, you can enable `allTextMode` which when true, Drill will read 
all JSON values as strings, rather than try to infer the data type.
+* `readNumbersAsDouble`:  By default Drill will attempt to interpret integers, 
floating point number types and strings.  One challenge is when data is 
consistent, Drill may
+  throw schema change exceptions. In addition to `allTextMode`, you can make 
Drill less sensitive by setting the `readNumbersAsDouble` to `true` which 
causes Drill to read all
+  numeric fields in JSON data as `double` data type rather than trying to 
distinguish between ints and doubles.
+* `enableEscapeAnyChar`:  Allows a user to escape any character with a \
+* `skipMalformedRecords`:  Allows Drill to skip malformed records and recover 
without throwing exceptions.
+* `skipMalformedDocument`:  Allows Drill to skip entire malformed documents 
without throwing errors.
+
+All of these can be set by adding the `jsonOptions` to your connection 
configuration as shown below:
+
+```json
+
+"jsonOptions": {
+  "allTextMode": true, 
+  "readNumbersAsDouble": true
+}
+
+```
+
+## Schema Provisioning
+One of the challenges of querying APIs is inconsistent data.  Drill allows you 
to provide a schema for individual endpoints.  You can do this in one of three 
ways: 
+
+1. By providing a schema inline [See: Specifying Schema as Table Function 
Parameter](https://drill.apache.org/docs/plugin-configuration-basics/#specifying-the-schema-as-table-function-parameter)
+2. By providing a schema in the configuration for the endpoint.
+3. By providing a serialized TupleMetadata of the desired schema.  This is an 
advanced functionality and should only be used by advanced Drill users.
+
+The schema provisioning currently supports complex types of Arrays and Maps at 
any nesting level.
+
+### Example Schema Provisioning:
+```json
+"jsonOptions": {
+  "providedSchema": [

Review Comment:
   One more thing to mention.  The way I implemented it only supports the data 
types that are supported by the JSON reader. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] cgivre commented on a diff in pull request #2526: DRILL-8204: Allow Provided Schema for HTTP Plugin in JSON Mode

Reply via email to