cgivre commented on a change in pull request #2129:
URL: https://github.com/apache/drill/pull/2129#discussion_r546085412



##########
File path: contrib/format-xml/README.md
##########
@@ -0,0 +1,76 @@
+# XML Format Reader
+This plugin enables Drill to read XML files without defining any kind of 
schema.
+
+## Configuration
+Aside from the file extension, there is one configuration option:
+
+* `dataLevel`: XML data often contains a considerable amount of nesting which 
is not necesarily useful for data analysis. This parameter allows you to set 
the nesting level 
+  where the data actually starts.  The levels start at `1`.
+
+The default configuration is shown below:
+
+```json
+"xml": {
+  "type": "xml",
+  "extensions": [
+    "xml"
+  ],
+  "dataLevel": 2
+}
+```
+
+## Data Types
+All fields are read as strings.  Nested fields are read as maps.  Future 
functionality could include support for lists.
+
+## Limitations: Schema Ambiguity
+XML is a challenging format to process as the structure does not give any 
hints about the schema.  For example, a JSON file might have the following 
record:
+
+```json
+"record" : {
+  "intField:" : 1,
+  "listField" : [1, 2],
+  "otherField" : {
+    "nestedField1" : "foo",
+    "nestedField2" : "bar"
+  }
+}
+```
+
+From this data, it is clear that `listField` is a `list` and `otherField` is a 
map.  This same data could be represented in XML as follows:
+
+```xml
+<record>
+  <intField>1</intField>

Review comment:
       Added language to README




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to