This is an automated email from the ASF dual-hosted git repository.
cgivre pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill.git
The following commit(s) were added to refs/heads/master by this push:
new c4cfe5acfd DRILL-8360: Add Provided Schema for XML Reader (#2710)
c4cfe5acfd is described below
commit c4cfe5acfdede85f4e31bc3398c14dfb2e8a312b
Author: Charles S. Givre <[email protected]>
AuthorDate: Mon Nov 28 12:59:20 2022 -0500
DRILL-8360: Add Provided Schema for XML Reader (#2710)
---
.../drill/exec/store/pdf/PdfBatchReader.java | 4 +-
contrib/format-xml/.gitignore | 2 +
contrib/format-xml/README.md | 35 ++++--
.../drill/exec/store/xml/XMLBatchReader.java | 7 ++
.../org/apache/drill/exec/store/xml/XMLReader.java | 85 +++++++++++++-
.../apache/drill/exec/store/xml/TestXMLReader.java | 35 ++++++
.../src/test/resources/xml/simple_array.xml | 44 +++++++
.../test/resources/xml/simple_with_datatypes.xml | 47 ++++++++
contrib/storage-http/README.md | 54 ++++-----
contrib/storage-http/XML_Options.md | 39 +++++++
.../drill/exec/store/http/HttpApiConfig.java | 34 +++++-
.../drill/exec/store/http/HttpXMLBatchReader.java | 53 ++++++++-
.../drill/exec/store/http/HttpXmlOptions.java | 120 +++++++++++++++++++
.../drill/exec/store/http/util/SimpleHttp.java | 15 +++
.../drill/exec/store/http/TestHttpPlugin.java | 128 ++++++++++++++++++++-
.../drill/exec/store/http/TestPagination.java | 8 +-
.../src/test/resources/data/response.xml | 20 ++--
17 files changed, 668 insertions(+), 62 deletions(-)
diff --git
a/contrib/format-pdf/src/main/java/org/apache/drill/exec/store/pdf/PdfBatchReader.java
b/contrib/format-pdf/src/main/java/org/apache/drill/exec/store/pdf/PdfBatchReader.java
index 94b4caf3bd..fd6cec92e6 100644
---
a/contrib/format-pdf/src/main/java/org/apache/drill/exec/store/pdf/PdfBatchReader.java
+++
b/contrib/format-pdf/src/main/java/org/apache/drill/exec/store/pdf/PdfBatchReader.java
@@ -486,7 +486,9 @@ public class PdfBatchReader implements ManagedReader {
Date parsedDate = simpleDateFormat.parse(cell.getText());
timestamp = Instant.ofEpochMilli(parsedDate.getTime());
} catch (ParseException e) {
- logger.error("Error parsing timestamp: " + e.getMessage());
+ throw UserException.parseError(e)
+ .message("Cannot parse " + cell.getText() + " as a timestamp. You
can specify a format string in the provided schema to correct this.")
+ .build(logger);
}
}
writer.setTimestamp(timestamp);
diff --git a/contrib/format-xml/.gitignore b/contrib/format-xml/.gitignore
new file mode 100644
index 0000000000..9341ff44dc
--- /dev/null
+++ b/contrib/format-xml/.gitignore
@@ -0,0 +1,2 @@
+# Directory to store oauth tokens for testing Googlesheets Storage plugin
+/src/test/resources/logback-test.xml
diff --git a/contrib/format-xml/README.md b/contrib/format-xml/README.md
index 3c50ce2956..ca32715ee6 100644
--- a/contrib/format-xml/README.md
+++ b/contrib/format-xml/README.md
@@ -1,10 +1,10 @@
# XML Format Reader
-This plugin enables Drill to read XML files without defining any kind of
schema.
+This plugin enables Drill to read XML files without defining any kind of
schema.
## Configuration
Aside from the file extension, there is one configuration option:
-* `dataLevel`: XML data often contains a considerable amount of nesting which
is not necesarily useful for data analysis. This parameter allows you to set
the nesting level
+* `dataLevel`: XML data often contains a considerable amount of nesting which
is not necesarily useful for data analysis. This parameter allows you to set
the nesting level
where the data actually starts. The levels start at `1`.
The default configuration is shown below:
@@ -22,6 +22,21 @@ The default configuration is shown below:
## Data Types
All fields are read as strings. Nested fields are read as maps. Future
functionality could include support for lists.
+## Provided Schema
+The XML Format Reader supports provided inline schemas. An example query
might be:
+
+```sql
+SELECT * FROM table(cp.`xml/simple_with_datatypes.xml`(type => 'xml',
+ schema => 'inline=(`int_field` INT, `bigint_field` BIGINT,
+ `float_field` FLOAT, `double_field` DOUBLE,
+ `boolean_field` BOOLEAN, `date_field` DATE,
+ `time_field` TIME, `timestamp_field` TIMESTAMP,
+ `string_field` VARCHAR,
+ `date2_field` DATE properties {`drill.format` = `MM/dd/yyyy`})'));
+```
+
+Current implementation only supports provided schema for scalar data types.
+
### Attributes
XML events can have attributes which can also be useful.
```xml
@@ -33,8 +48,8 @@ XML events can have attributes which can also be useful.
</book>
```
-In the example above, the `title` field contains two attributes, the `binding`
and `subcategory`. In order to access these fields, Drill creates a map called
`attributes` and
-adds an entry for each attribute with the field name and then the attribute
name. Every XML file will have a field called `atttributes` regardless of
whether the data actually
+In the example above, the `title` field contains two attributes, the `binding`
and `subcategory`. In order to access these fields, Drill creates a map called
`attributes` and
+adds an entry for each attribute with the field name and then the attribute
name. Every XML file will have a field called `atttributes` regardless of
whether the data actually
has attributes or not.
```xml
@@ -65,7 +80,7 @@ has attributes or not.
If you queried this data in Drill you'd get the table below:
```sql
-SELECT *
+SELECT *
FROM <path>.`attributes.xml`
```
@@ -82,7 +97,7 @@ apache drill> select * from dfs.test.`attributes.xml`;
## Limitations: Malformed XML
Drill can read properly formatted XML. If the XML is not properly formatted,
Drill will throw errors. Some issues include illegal characters in field names,
or attribute names.
-Future functionality will include some degree of data cleaning and fault
tolerance.
+Future functionality will include some degree of data cleaning and fault
tolerance.
## Limitations: Schema Ambiguity
XML is a challenging format to process as the structure does not give any
hints about the schema. For example, a JSON file might have the following
record:
@@ -126,13 +141,13 @@ This is no problem to parse this data. But consider what
would happen if we enco
</otherField>
</record>
```
-In this example, there is no way for Drill to know whether `listField` is a
`list` or a `map` because it only has one entry.
+In this example, there is no way for Drill to know whether `listField` is a
`list` or a `map` because it only has one entry.
## Future Functionality
* **Build schema from XSD file or link**: One of the major challenges of this
reader is having to infer the schema of the data. XML files do provide a schema
although this is not
- required. In the future, if there is interest, we can extend this reader to
use an XSD file to build the schema which will be used to parse the actual XML
file.
-
+ required. In the future, if there is interest, we can extend this reader to
use an XSD file to build the schema which will be used to parse the actual XML
file.
+
* **Infer Date Fields**: It may be possible to add the ability to infer data
fields.
-* **List Support**: Future functionality may include the ability to infer
lists from data structures.
\ No newline at end of file
+* **List Support**: Future functionality may include the ability to infer
lists from data structures.
diff --git
a/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLBatchReader.java
b/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLBatchReader.java
index 52a2b6d903..579652a1df 100644
---
a/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLBatchReader.java
+++
b/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLBatchReader.java
@@ -28,6 +28,7 @@ import
org.apache.drill.exec.physical.impl.scan.v3.file.FileDescrip;
import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator;
import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
import org.apache.drill.exec.store.dfs.easy.EasySubScan;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -59,6 +60,12 @@ public class XMLBatchReader implements ManagedReader {
dataLevel = readerConfig.dataLevel;
file = negotiator.file();
+ // Add schema if provided
+ if (negotiator.providedSchema() != null) {
+ TupleMetadata schema = negotiator.providedSchema();
+ negotiator.tableSchema(schema, false);
+ }
+
ResultSetLoader loader = negotiator.build();
rootRowWriter = loader.writer();
diff --git
a/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLReader.java
b/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLReader.java
index b3af9d2ea3..8b23ac7621 100644
---
a/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLReader.java
+++
b/contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLReader.java
@@ -31,6 +31,7 @@ import org.apache.drill.exec.record.metadata.SchemaBuilder;
import org.apache.drill.exec.store.ImplicitColumnUtils.ImplicitColumns;
import org.apache.drill.exec.vector.accessor.ScalarWriter;
import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Strings;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -43,6 +44,13 @@ import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import java.io.Closeable;
import java.io.InputStream;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
+import java.time.Instant;
+import java.time.LocalDate;
+import java.time.LocalTime;
+import java.time.format.DateTimeFormatter;
+import java.util.Date;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
@@ -177,7 +185,7 @@ public class XMLReader implements Closeable {
currentEvent = nextEvent;
// Process the event
- processEvent(currentEvent, lastEvent);
+ processEvent(currentEvent, lastEvent, reader.peek());
} catch (XMLStreamException e) {
throw UserException
.dataReadError(e)
@@ -195,7 +203,7 @@ public class XMLReader implements Closeable {
* the self-closing events can cause schema issues with Drill specifically,
if a self-closing event
* is detected prior to a non-self-closing event, and that populated event
contains a map or other nested data
* Drill will throw a schema change exception.
- *
+ * <p>
* Since Drill uses Java's streaming XML parser, unfortunately, it does not
provide a means of identifying
* self-closing tags. This function does that by comparing the event with
the previous event and looking for
* a condition where one event is a start and the other is an ending event.
Additionally, the column number and
@@ -229,7 +237,7 @@ public class XMLReader implements Closeable {
* @param lastEvent The previous event which was processed
*/
private void processEvent(XMLEvent currentEvent,
- XMLEvent lastEvent) {
+ XMLEvent lastEvent, XMLEvent nextEvent) {
String mapName;
switch (currentEvent.getEventType()) {
@@ -282,7 +290,6 @@ public class XMLReader implements Closeable {
attributePrefix = XMLUtils.addField(attributePrefix, fieldName);
}
- @SuppressWarnings("unchecked")
Iterator<Attribute> attributes = startElement.getAttributes();
if (attributes != null && attributes.hasNext()) {
writeAttributes(attributePrefix, attributes);
@@ -428,8 +435,70 @@ public class XMLReader implements Closeable {
index = writer.addColumn(colSchema);
}
ScalarWriter colWriter = writer.scalar(index);
+ ColumnMetadata columnMetadata = writer.tupleSchema().metadata(index);
+ MinorType dataType = columnMetadata.schema().getType().getMinorType();
+ String dateFormat;
+
+ // Write the values depending on their data type. This only applies to
scalar fields.
if (fieldValue != null && (currentState != xmlState.ROW_ENDED &&
currentState != xmlState.FIELD_ENDED)) {
- colWriter.setString(fieldValue);
+ switch (dataType) {
+ case BIT:
+ colWriter.setBoolean(Boolean.parseBoolean(fieldValue));
+ break;
+ case TINYINT:
+ case SMALLINT:
+ case INT:
+ colWriter.setInt(Integer.parseInt(fieldValue));
+ break;
+ case BIGINT:
+ colWriter.setLong(Long.parseLong(fieldValue));
+ break;
+ case FLOAT4:
+ case FLOAT8:
+ colWriter.setDouble(Double.parseDouble(fieldValue));
+ break;
+ case DATE:
+ dateFormat = columnMetadata.property("drill.format");
+ LocalDate localDate;
+ if (Strings.isNullOrEmpty(dateFormat)) {
+ localDate = LocalDate.parse(fieldValue);
+ } else {
+ localDate = LocalDate.parse(fieldValue,
DateTimeFormatter.ofPattern(dateFormat));
+ }
+ colWriter.setDate(localDate);
+ break;
+ case TIME:
+ dateFormat = columnMetadata.property("drill.format");
+ LocalTime localTime;
+ if (Strings.isNullOrEmpty(dateFormat)) {
+ localTime = LocalTime.parse(fieldValue);
+ } else {
+ localTime = LocalTime.parse(fieldValue,
DateTimeFormatter.ofPattern(dateFormat));
+ }
+ colWriter.setTime(localTime);
+ break;
+ case TIMESTAMP:
+ dateFormat = columnMetadata.property("drill.format");
+ Instant timestamp;
+ if (Strings.isNullOrEmpty(dateFormat)) {
+ timestamp = Instant.parse(fieldValue);
+ } else {
+ try {
+ SimpleDateFormat simpleDateFormat = new
SimpleDateFormat(dateFormat);
+ Date parsedDate = simpleDateFormat.parse(fieldValue);
+ timestamp = Instant.ofEpochMilli(parsedDate.getTime());
+ } catch (ParseException e) {
+ throw UserException.parseError(e)
+ .message("Cannot parse " + fieldValue + " as a timestamp. You
can specify a format string in the provided schema to correct this.")
+ .addContext(errorContext)
+ .build(logger);
+ }
+ }
+ colWriter.setTimestamp(timestamp);
+ break;
+ default:
+ colWriter.setString(fieldValue);
+ }
changeState(xmlState.FIELD_ENDED);
}
}
@@ -491,7 +560,11 @@ public class XMLReader implements Closeable {
}
private TupleWriter getAttributeWriter() {
- int attributeIndex =
rootRowWriter.addColumn(SchemaBuilder.columnSchema(ATTRIBUTE_MAP_NAME,
MinorType.MAP, DataMode.REQUIRED));
+ int attributeIndex = rootRowWriter.tupleSchema().index(ATTRIBUTE_MAP_NAME);
+
+ if (attributeIndex == -1) {
+ attributeIndex =
rootRowWriter.addColumn(SchemaBuilder.columnSchema(ATTRIBUTE_MAP_NAME,
MinorType.MAP, DataMode.REQUIRED));
+ }
return rootRowWriter.tuple(attributeIndex);
}
diff --git
a/contrib/format-xml/src/test/java/org/apache/drill/exec/store/xml/TestXMLReader.java
b/contrib/format-xml/src/test/java/org/apache/drill/exec/store/xml/TestXMLReader.java
index 6a9fc11bf4..260fe9c3cb 100644
---
a/contrib/format-xml/src/test/java/org/apache/drill/exec/store/xml/TestXMLReader.java
+++
b/contrib/format-xml/src/test/java/org/apache/drill/exec/store/xml/TestXMLReader.java
@@ -32,6 +32,9 @@ import org.junit.Test;
import org.junit.experimental.categories.Category;
import java.nio.file.Paths;
+import java.time.Instant;
+import java.time.LocalDate;
+import java.time.LocalTime;
import static org.apache.drill.test.QueryTestUtil.generateCompressedFile;
import static org.apache.drill.test.rowSet.RowSetUtilities.mapArray;
@@ -83,6 +86,38 @@ public class TestXMLReader extends ClusterTest {
new RowSetComparison(expected).verifyAndClearAll(results);
}
+ @Test
+ public void testSimpleProvidedSchema() throws Exception {
+ String sql = "SELECT * FROM table(cp.`xml/simple_with_datatypes.xml` (type
=> 'xml', schema " +
+ "=> 'inline=(`int_field` INT, `bigint_field` BIGINT, `float_field`
FLOAT, `double_field` DOUBLE, `boolean_field` " +
+ "BOOLEAN, `date_field` DATE, `time_field` TIME, `timestamp_field`
TIMESTAMP, `string_field`" +
+ " VARCHAR, `date2_field` DATE properties {`drill.format` =
`MM/dd/yyyy`})'))";
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+ assertEquals(2, results.rowCount());
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("int_field", MinorType.INT)
+ .addNullable("bigint_field", MinorType.BIGINT)
+ .addNullable("float_field", MinorType.FLOAT4)
+ .addNullable("double_field", MinorType.FLOAT8)
+ .addNullable("boolean_field", MinorType.BIT)
+ .addNullable("date_field", MinorType.DATE)
+ .addNullable("time_field", MinorType.TIME)
+ .addNullable("timestamp_field", MinorType.TIMESTAMP)
+ .addNullable("string_field", MinorType.VARCHAR)
+ .addNullable("date2_field", MinorType.DATE)
+ .add("attributes", MinorType.MAP)
+ .buildSchema();
+
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow(1, 1000L, 1.2999999523162842, 3.3, true,
LocalDate.parse("2022-01-01"), LocalTime.parse("12:04:34"),
Instant.parse("2022-01-06T12:30:30Z"), "string", LocalDate.parse("2022-03-02"),
mapArray())
+ .addRow(2, 2000L, 2.299999952316284, 4.3, false,
LocalDate.parse("2022-02-01"), LocalTime.parse("13:04:34"),
Instant.parse("2022-03-06T12:30:30Z"), null, LocalDate.parse("2022-03-01"),
mapArray())
+ .build();
+
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+
@Test
public void testSelfClosingTags() throws Exception {
String sql = "SELECT * FROM cp.`xml/weather.xml`";
diff --git a/contrib/format-xml/src/test/resources/xml/simple_array.xml
b/contrib/format-xml/src/test/resources/xml/simple_array.xml
new file mode 100644
index 0000000000..c734f3a559
--- /dev/null
+++ b/contrib/format-xml/src/test/resources/xml/simple_array.xml
@@ -0,0 +1,44 @@
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+-->
+<dependencies>
+ <dependency>
+ <groupId>org.apache.drill.exec</groupId>
+ <array_field>
+ <value>1</value>
+ <value>2</value>
+ <value>3</value>
+ </array_field>
+ </dependency>
+
+ <dependency>
+ <groupId>org.apache.drill.exec</groupId>
+ <scope>test</scope>
+ <array_field>
+ <value>4</value>
+ <value>5</value>
+ <value>6</value>
+ </array_field>
+ </dependency>
+
+ <dependency>
+ <groupId>org.apache.drill</groupId>
+ <scope>test</scope>
+ </dependency>
+</dependencies>
diff --git
a/contrib/format-xml/src/test/resources/xml/simple_with_datatypes.xml
b/contrib/format-xml/src/test/resources/xml/simple_with_datatypes.xml
new file mode 100644
index 0000000000..92f6296040
--- /dev/null
+++ b/contrib/format-xml/src/test/resources/xml/simple_with_datatypes.xml
@@ -0,0 +1,47 @@
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+-->
+
+<data>
+ <row>
+ <int_field>1</int_field>
+ <bigint_field>1000</bigint_field>
+ <float_field>1.3</float_field>
+ <double_field>3.3</double_field>
+ <boolean_field>true</boolean_field>
+ <date_field>2022-01-01</date_field>
+ <date2_field>03/02/2022</date2_field>
+ <time_field>12:04:34</time_field>
+ <timestamp_field>2022-01-06T12:30:30Z</timestamp_field>
+ <string_field>string</string_field>
+ </row>
+
+ <row>
+ <int_field>2</int_field>
+ <bigint_field>2000</bigint_field>
+ <float_field>2.3</float_field>
+ <double_field>4.3</double_field>
+ <boolean_field>false</boolean_field>
+ <date_field>2022-02-01</date_field>
+ <date2_field>03/01/2022</date2_field>
+ <time_field>13:04:34</time_field>
+ <timestamp_field>2022-03-06T12:30:30Z</timestamp_field>
+ </row>
+
+</data>
diff --git a/contrib/storage-http/README.md b/contrib/storage-http/README.md
index d384bedd11..797ab5f407 100644
--- a/contrib/storage-http/README.md
+++ b/contrib/storage-http/README.md
@@ -49,23 +49,23 @@ The `connection` property can accept the following options.
Many APIs require parameters to be passed directly in the URL instead of as
query arguments. For example, github's API allows you to query an
organization's repositories with the following
URL: https://github.com/orgs/{org}/repos
-As of Drill 1.20.0, you can simply set the URL in the connection using the
curly braces. If your API includes URL parameters you must include them in the
`WHERE` clause in your
+As of Drill 1.20.0, you can simply set the URL in the connection using the
curly braces. If your API includes URL parameters you must include them in the
`WHERE` clause in your
query, or specify a default value in the configuration.
As an example, the API above, you would have to query as shown below:
```sql
-SELECT *
+SELECT *
FROM api.github
WHERE org = 'apache'
```
This query would replace the `org`in the URL with the value from the `WHERE`
clause, in this case `apache`. You can specify a default value as follows:
`https://someapi.com/
-{param1}/{param2=default}`. In this case, the default would be used if and
only if there isn't a parameter supplied in the query.
+{param1}/{param2=default}`. In this case, the default would be used if and
only if there isn't a parameter supplied in the query.
#### Limitations on URL Parameters
-* Drill does not support boolean expressions of URL parameters in queries.
For instance, for the above example, if you were to include `WHERE org='apache'
OR org='linux'`,
- these parameters could not be pushed down in the current state.
+* Drill does not support boolean expressions of URL parameters in queries.
For instance, for the above example, if you were to include `WHERE org='apache'
OR org='linux'`,
+ these parameters could not be pushed down in the current state.
* All URL parameter clauses must be equality only.
### Passing Parameters in the Query
@@ -141,6 +141,7 @@ key2=value2"
* `query_string`: Parameters from the query are pushed down to the query
string. Static parameters are pushed to the post body.
* `post_body`: Both static and parameters from the query are pushed to the
post body as key/value pairs
* `json_body`: Both static and parameters from the query are pushed to the
post body as json.
+* `xml_body`: Both static and parameters from the query are pushed to the
post body as XML.
#### Headers
@@ -245,13 +246,14 @@ as that shown above. Drill assumes that the server will
uses HTTP status codes t
indicate a bad request or other error.
#### Input Type
-The REST plugin accepts three different types of input: `json`, `csv` and
`xml`. The default is `json`. If you are using `XML` as a data type, there is
an additional
-configuration option called `xmlDataLevel` which reduces the level of unneeded
nesting found in XML files. You can find more information in the documentation
for Drill's XML
-format plugin.
+The REST plugin accepts three different types of input: `json`, `csv` and
`xml`. The default is `json`.
#### JSON Configuration
[Read the documentation for configuring json options, including schema
provisioning.](JSON_Options.md)
+#### XML Configuration
+[Read the documentation for configuring XML options, including schema
provisioning.](XML_Options.md)
+
#### Authorization
`authType`: If your API requires authentication, specify the authentication
@@ -263,8 +265,8 @@ If the `authType` is set to `basic`, `username` and
`password` must be set in th
`password`: The password for basic authentication.
##### Global Credentials
-If you have an HTTP plugin with multiple endpoints that all use the same
credentials, you can set the `authType` to `basic` and set global
-credentials in the storage plugin configuration.
+If you have an HTTP plugin with multiple endpoints that all use the same
credentials, you can set the `authType` to `basic` and set global
+credentials in the storage plugin configuration.
Simply add the following to the storage plugin configuration:
```json
@@ -280,12 +282,12 @@ Note that the `authType` still must be set to `basic` and
that any endpoint cred
#### Limiting Results
Some APIs support a query parameter which is used to limit the number of
results returned by the API. In this case you can set the `limitQueryParam`
config variable to the query parameter name and Drill will automatically
include this in your query. For instance, if you have an API which supports a
limit query parameter called `maxRecords` and you set the abovementioned config
variable then execute the following query:
-
+
```sql
SELECT <fields>
FROM api.limitedApi
-LIMIT 10
-```
+LIMIT 10
+```
Drill will send the following request to your API:
```
https://<api>?maxRecords=10
@@ -298,12 +300,12 @@ If the API which you are querying requires OAuth2.0 for
authentication [read the
If you want to use automatic pagination in Drill, [click here to read the
documentation for pagination](Pagination.md).
#### errorOn400
-When a user makes HTTP calls, the response code will be from 100-599. 400
series error codes can contain useful information and in some cases you would
not want Drill to throw
-errors on 400 series errors. This option allows you to define Drill's
behavior on 400 series error codes. When set to `true`, Drill will throw an
exception and halt execution
+When a user makes HTTP calls, the response code will be from 100-599. 400
series error codes can contain useful information and in some cases you would
not want Drill to throw
+errors on 400 series errors. This option allows you to define Drill's
behavior on 400 series error codes. When set to `true`, Drill will throw an
exception and halt execution
on 400 series errors, `false` will return an empty result set (with implicit
fields populated).
#### verifySSLCert
-Default is `true`, but when set to false, Drill will trust all SSL
certificates. Useful for debugging or on internal corporate networks using
self-signed certificates or
+Default is `true`, but when set to false, Drill will trust all SSL
certificates. Useful for debugging or on internal corporate networks using
self-signed certificates or
private certificate authorities.
#### caseSensitiveFilters
@@ -447,7 +449,7 @@ To query this API, set the configuration as follows:
"authType": "none",
"userName": null,
"password": null,
- "postBody": null,
+ "postBody": null,
"inputType": "json",
"errorOn400": true
}
@@ -495,7 +497,7 @@ body. Set the configuration as follows:
"authType": "none",
"userName": null,
"password": null,
- "postBody": null,
+ "postBody": null,
"errorOn400": true
}
}
@@ -641,24 +643,24 @@ The HTTP plugin includes four implicit fields which can
be used for debugging.
* `_response_code`: The response code from the HTTP request. This field is an
`INT`.
* `_response_message`: The response message.
* `_response_protocol`: The response protocol.
-* `_response_url`: The actual URL sent to the API.
+* `_response_url`: The actual URL sent to the API.
## Joining Data
-There are some situations where a user might want to join data with an API
result and the pushdowns prevent that from happening. The main situation where
this happens is when
-an API has parameters which are part of the URL AND these parameters are
dynamically populated via a join.
+There are some situations where a user might want to join data with an API
result and the pushdowns prevent that from happening. The main situation where
this happens is when
+an API has parameters which are part of the URL AND these parameters are
dynamically populated via a join.
-In this case, there are two functions `http_get_url` and `http_get` which you
can use to faciliate these joins.
+In this case, there are two functions `http_get_url` and `http_get` which you
can use to faciliate these joins.
* `http_request('<storage_plugin_name>', <params>)`: This function accepts a
storage plugin as input and an optional list of parameters to include in a URL.
-* `http_get(<url>, <params>)`: This function works in the same way except
that it does not pull any configuration information from existing storage
plugins. The input url for
- the `http_get` function must be a valid URL.
+* `http_get(<url>, <params>)`: This function works in the same way except
that it does not pull any configuration information from existing storage
plugins. The input url for
+ the `http_get` function must be a valid URL.
### Example Queries
-Let's say that you have a storage plugin called `github` with an endpoint
called `repos` which points to the url: https://github.com/orgs/{org}/repos.
It is easy enough to
+Let's say that you have a storage plugin called `github` with an endpoint
called `repos` which points to the url: https://github.com/orgs/{org}/repos.
It is easy enough to
write a query like this:
```sql
-SELECT *
+SELECT *
FROM github.repos
WHERE org='apache'
```
diff --git a/contrib/storage-http/XML_Options.md
b/contrib/storage-http/XML_Options.md
new file mode 100644
index 0000000000..e53e1e8e99
--- /dev/null
+++ b/contrib/storage-http/XML_Options.md
@@ -0,0 +1,39 @@
+# XML Options
+Drill has a several XML configuration options to allow you to configure how
Drill interprets XML files.
+
+## DataLevel
+XML data often contains a considerable amount of nesting which is not
necessarily useful for data analysis. This parameter allows you to set the
nesting level
+ where the data actually starts. The levels start at `1`.
+
+## Schema Provisioning
+One of the challenges of querying APIs is inconsistent data. Drill allows you
to provide a schema for individual endpoints. You can do this in one of three
ways:
+
+1. By providing a schema inline [See: Specifying Schema as Table Function
Parameter](https://drill.apache.org/docs/plugin-configuration-basics/#specifying-the-schema-as-table-function-parameter)
+2. By providing a schema in the configuration for the endpoint.
+
+Note: At the time of writing Drill's XML reader only supports provided schema
with scalar data types.
+
+## Example Configuration:
+You can set either of these options on a per-endpoint basis as shown below:
+
+```json
+"xmlOptions": {
+ "dataLevel": 1
+}
+```
+
+Or,
+```json
+"xmlOptions": {
+ "dataLevel": 2,
+ "schema": {
+ "type": "tuple_schema",
+ "columns": [
+ {
+ "name": "custom_field",
+ "type": "VARCHAR
+ }
+ ]
+ }
+}
+```
diff --git
a/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpApiConfig.java
b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpApiConfig.java
index 91af33e36f..32efdbf559 100644
---
a/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpApiConfig.java
+++
b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpApiConfig.java
@@ -95,6 +95,7 @@ public class HttpApiConfig {
@JsonProperty
private final String inputType;
@JsonProperty
+ @Deprecated
private final int xmlDataLevel;
@JsonProperty
private final String limitQueryParam;
@@ -111,6 +112,9 @@ public class HttpApiConfig {
@JsonProperty
private final HttpJsonOptions jsonOptions;
+ @JsonProperty
+ private final HttpXmlOptions xmlOptions;
+
@JsonInclude
@JsonProperty
private final boolean verifySSLCert;
@@ -164,6 +168,7 @@ public class HttpApiConfig {
return this.caseSensitiveFilters;
}
+ @Deprecated
public int xmlDataLevel() {
return this.xmlDataLevel;
}
@@ -179,6 +184,9 @@ public class HttpApiConfig {
public HttpJsonOptions jsonOptions() {
return this.jsonOptions;
}
+ public HttpXmlOptions xmlOptions() {
+ return this.xmlOptions;
+ }
public boolean verifySSLCert() {
return this.verifySSLCert;
@@ -202,7 +210,6 @@ public class HttpApiConfig {
}
HttpApiConfig that = (HttpApiConfig) o;
return requireTail == that.requireTail
- && xmlDataLevel == that.xmlDataLevel
&& errorOn400 == that.errorOn400
&& verifySSLCert == that.verifySSLCert
&& directCredentials == that.directCredentials
@@ -218,6 +225,7 @@ public class HttpApiConfig {
&& Objects.equals(inputType, that.inputType)
&& Objects.equals(limitQueryParam, that.limitQueryParam)
&& Objects.equals(jsonOptions, that.jsonOptions)
+ && Objects.equals(xmlOptions, that.xmlOptions)
&& Objects.equals(credentialsProvider, that.credentialsProvider)
&& Objects.equals(paginator, that.paginator);
}
@@ -225,7 +233,7 @@ public class HttpApiConfig {
@Override
public int hashCode() {
return Objects.hash(url, requireTail, method, postBody, headers, params,
dataPath,
- authType, inputType, xmlDataLevel, limitQueryParam, errorOn400,
jsonOptions, verifySSLCert,
+ authType, inputType, limitQueryParam, errorOn400, jsonOptions,
xmlOptions, verifySSLCert,
credentialsProvider, paginator, directCredentials,
postParameterLocation, caseSensitiveFilters);
}
@@ -243,10 +251,10 @@ public class HttpApiConfig {
.field("caseSensitiveFilters", caseSensitiveFilters)
.field("authType", authType)
.field("inputType", inputType)
- .field("xmlDataLevel", xmlDataLevel)
.field("limitQueryParam", limitQueryParam)
.field("errorOn400", errorOn400)
.field("jsonOptions", jsonOptions)
+ .field("xmlOptions", xmlOptions)
.field("verifySSLCert", verifySSLCert)
.field("credentialsProvider", credentialsProvider)
.field("paginator", paginator)
@@ -272,7 +280,12 @@ public class HttpApiConfig {
* All POST parameters, both static and from the query, are pushed to the
POST body
* as a JSON object.
*/
- JSON_BODY
+ JSON_BODY,
+ /**
+ * All POST parameters, both static and from the query, are pushed to the
POST body
+ * as an XML request.
+ */
+ XML_BODY
}
public enum HttpMethod {
@@ -292,6 +305,7 @@ public class HttpApiConfig {
? HttpMethod.GET.toString() : builder.method.trim().toUpperCase();
this.url = builder.url;
this.jsonOptions = builder.jsonOptions;
+ this.xmlOptions = builder.xmlOptions;
HttpMethod httpMethod = HttpMethod.valueOf(this.method);
// Get the request method. Only accept GET and POST requests. Anything
else will default to GET.
@@ -438,6 +452,7 @@ public class HttpApiConfig {
private boolean errorOn400;
private HttpJsonOptions jsonOptions;
+ private HttpXmlOptions xmlOptions;
private CredentialsProvider credentialsProvider;
@@ -479,6 +494,11 @@ public class HttpApiConfig {
return this;
}
+ public HttpApiConfigBuilder xmlOptions(HttpXmlOptions options) {
+ this.xmlOptions = options;
+ return this;
+ }
+
public HttpApiConfigBuilder requireTail(boolean requireTail) {
this.requireTail = requireTail;
return this;
@@ -539,6 +559,12 @@ public class HttpApiConfig {
return this;
}
+ /**
+ * Do not use. Use xmlOptions instead to set XML data level.
+ * @param xmlDataLevel
+ * @return
+ */
+ @Deprecated
public HttpApiConfigBuilder xmlDataLevel(int xmlDataLevel) {
this.xmlDataLevel = xmlDataLevel;
return this;
diff --git
a/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXMLBatchReader.java
b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXMLBatchReader.java
index d5dc5b5fe0..5aec7ffdf8 100644
---
a/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXMLBatchReader.java
+++
b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXMLBatchReader.java
@@ -26,8 +26,10 @@ import org.apache.drill.common.exceptions.CustomErrorContext;
import org.apache.drill.common.exceptions.UserException;
import org.apache.drill.exec.ExecConstants;
import org.apache.drill.exec.physical.impl.scan.framework.SchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.v3.FixedReceiver;
import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
import org.apache.drill.exec.store.ImplicitColumnUtils.ImplicitColumns;
import org.apache.drill.exec.store.http.paginator.Paginator;
import org.apache.drill.exec.store.http.util.SimpleHttp;
@@ -53,7 +55,13 @@ public class HttpXMLBatchReader extends HttpBatchReader {
super(subScan);
this.subScan = subScan;
this.maxRecords = subScan.maxRecords();
- this.dataLevel = subScan.tableSpec().connectionConfig().xmlDataLevel();
+
+ // TODO Remove the XMLDataLevel parameter. For now, check both
+ if (subScan.tableSpec().connectionConfig().xmlOptions() == null) {
+ this.dataLevel = subScan.tableSpec().connectionConfig().xmlDataLevel();
+ } else {
+ this.dataLevel =
subScan.tableSpec().connectionConfig().xmlOptions().getDataLevel();
+ }
}
@@ -61,7 +69,12 @@ public class HttpXMLBatchReader extends HttpBatchReader {
super(subScan, paginator);
this.subScan = subScan;
this.maxRecords = subScan.maxRecords();
- this.dataLevel = subScan.tableSpec().connectionConfig().xmlDataLevel();
+
+ if (subScan.tableSpec().connectionConfig().xmlOptions() == null) {
+ this.dataLevel = subScan.tableSpec().connectionConfig().xmlDataLevel();
+ } else {
+ this.dataLevel =
subScan.tableSpec().connectionConfig().xmlOptions().getDataLevel();
+ }
}
@Override
@@ -96,6 +109,12 @@ public class HttpXMLBatchReader extends HttpBatchReader {
inStream = http.getInputStream();
// Initialize the XMLReader the reader
try {
+ // Add schema if provided
+ TupleMetadata finalSchema = getSchema(negotiator);
+ if (finalSchema != null) {
+ negotiator.tableSchema(finalSchema, false);
+ }
+
xmlReader = new XMLReader(inStream, dataLevel);
resultLoader = negotiator.build();
@@ -121,6 +140,36 @@ public class HttpXMLBatchReader extends HttpBatchReader {
return true;
}
+ /**
+ * This function obtains the correct schema for the {@link XMLReader}.
There are four possibilities:
+ * 1. The schema is provided in the configuration only. In this case, that
schema will be returned.
+ * 2. The schema is provided in both the configuration and inline. These
two schemas will be merged together.
+ * 3. The schema is provided inline in a query. In this case, that schema
will be returned.
+ * 4. No schema is provided. Function returns null.
+ * @param negotiator {@link SchemaNegotiator} The schema negotiator with all
the connection information
+ * @return The built {@link TupleMetadata} of the provided schema, null if
none provided.
+ */
+ private TupleMetadata getSchema(SchemaNegotiator negotiator) {
+ if (subScan.tableSpec().connectionConfig().xmlOptions() != null &&
+ subScan.tableSpec().connectionConfig().xmlOptions().schema() != null) {
+ TupleMetadata configuredSchema =
subScan.tableSpec().connectionConfig().xmlOptions().schema();
+
+ // If it has a provided schema both inline and in the config, merge the
two, otherwise, return the config schema
+ if (negotiator.hasProvidedSchema()) {
+ TupleMetadata inlineSchema = negotiator.providedSchema();
+ return FixedReceiver.Builder.mergeSchemas(configuredSchema,
inlineSchema);
+ } else {
+ return configuredSchema;
+ }
+ } else {
+ if (negotiator.hasProvidedSchema()) {
+ return negotiator.providedSchema();
+ }
+ }
+ return null;
+ }
+
+
@Override
public boolean next() {
boolean result;
diff --git
a/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXmlOptions.java
b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXmlOptions.java
new file mode 100644
index 0000000000..d73e576778
--- /dev/null
+++
b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXmlOptions.java
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.http;
+
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
+import com.fasterxml.jackson.databind.annotation.JsonPOJOBuilder;
+import org.apache.drill.common.PlanStringBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+import java.util.Objects;
+
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+@JsonDeserialize(builder = HttpXmlOptions.HttpXmlOptionsBuilder.class)
+public class HttpXmlOptions {
+
+ @JsonProperty
+ private final int dataLevel;
+
+ @JsonProperty
+ private final TupleMetadata schema;
+
+ @JsonCreator
+ public HttpXmlOptions(@JsonProperty("dataLevel") Integer dataLevel,
+ @JsonProperty("schema") TupleMetadata schema) {
+ this.schema = schema;
+ if (dataLevel == null || dataLevel < 1) {
+ this.dataLevel = 1;
+ } else {
+ this.dataLevel = dataLevel;
+ }
+ }
+
+ public HttpXmlOptions(HttpXmlOptionsBuilder builder) {
+ this.dataLevel = builder.dataLevel;
+ this.schema = builder.schema;
+ }
+
+
+ public static HttpXmlOptionsBuilder builder() {
+ return new HttpXmlOptionsBuilder();
+ }
+
+ @JsonProperty("dataLevel")
+ public int getDataLevel() {
+ return this.dataLevel;
+ }
+
+ @JsonProperty("schema")
+ public TupleMetadata schema() {
+ return this.schema;
+ }
+
+
+ @Override
+ public boolean equals(Object o) {
+ if (this == o) {
+ return true;
+ }
+ if (o == null || getClass() != o.getClass()) {
+ return false;
+ }
+ HttpXmlOptions that = (HttpXmlOptions) o;
+ return Objects.equals(dataLevel, that.dataLevel)
+ && Objects.equals(schema, that.schema);
+ }
+
+ @Override
+ public int hashCode() {
+ return Objects.hash(dataLevel, schema);
+ }
+
+ @Override
+ public String toString() {
+ return new PlanStringBuilder(this)
+ .field("dataLevel", dataLevel)
+ .field("schema", schema)
+ .toString();
+ }
+
+ @JsonPOJOBuilder(withPrefix = "")
+ public static class HttpXmlOptionsBuilder {
+
+ private int dataLevel;
+ private TupleMetadata schema;
+
+ public HttpXmlOptions.HttpXmlOptionsBuilder dataLevel(int dataLevel) {
+ this.dataLevel = dataLevel;
+ return this;
+ }
+
+ public HttpXmlOptions.HttpXmlOptionsBuilder schema(TupleMetadata schema) {
+ this.schema = schema;
+ return this;
+ }
+
+ public HttpXmlOptions build() {
+ return new HttpXmlOptions(this);
+ }
+ }
+}
diff --git
a/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
index d0f12f26c2..3568fe9213 100644
---
a/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
+++
b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
@@ -104,6 +104,7 @@ public class SimpleHttp implements AutoCloseable {
private static final int DEFAULT_TIMEOUT = 1;
private static final Pattern URL_PARAM_REGEX =
Pattern.compile("\\{(\\w+)(?:=(\\w*))?}");
public static final MediaType JSON_MEDIA_TYPE =
MediaType.get("application/json; charset=utf-8");
+ public static final MediaType XML_MEDIA_TYPE =
MediaType.get("application/xml");
private static final OkHttpClient SIMPLE_CLIENT = new OkHttpClient.Builder()
.connectTimeout(DEFAULT_TIMEOUT, TimeUnit.SECONDS)
.writeTimeout(DEFAULT_TIMEOUT, TimeUnit.SECONDS)
@@ -365,6 +366,20 @@ public class SimpleHttp implements AutoCloseable {
RequestBody requestBody = RequestBody.create(json.toJSONString(),
JSON_MEDIA_TYPE);
requestBuilder.post(requestBody);
+ } else if (apiConfig.getPostLocation() == PostLocation.XML_BODY) {
+ StringBuilder xmlRequest = new StringBuilder();
+ xmlRequest.append("<request>");
+ if (filters != null) {
+ for (Map.Entry<String, String> filter : filters.entrySet()) {
+ xmlRequest.append("<").append(filter.getKey()).append(">");
+ xmlRequest.append(filter.getValue());
+ xmlRequest.append("</").append(filter.getKey()).append(">");
+ }
+ }
+ xmlRequest.append("</request>");
+ RequestBody requestBody = RequestBody.create(xmlRequest.toString(),
XML_MEDIA_TYPE);
+ requestBuilder.post(requestBody);
+
} else {
formBodyBuilder = buildPostBody(apiConfig.postBody());
requestBuilder.post(formBodyBuilder.build());
diff --git
a/contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestHttpPlugin.java
b/contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestHttpPlugin.java
index c09e3a5503..32e31fca57 100644
---
a/contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestHttpPlugin.java
+++
b/contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestHttpPlugin.java
@@ -131,11 +131,26 @@ public class TestHttpPlugin extends ClusterTest {
.requireTail(false)
.build();
+ HttpXmlOptions nycXmlOptions = HttpXmlOptions.builder()
+ .dataLevel(5)
+ .build();
+
+ HttpApiConfig nycConfig = HttpApiConfig.builder()
+ .url("https://www.checkbooknyc.com/api")
+ .method("post")
+ .inputType("xml")
+ .requireTail(false)
+ .params(Arrays.asList("type_of_data", "records_from", "max_records"))
+ .postParameterLocation("xml_body")
+ .xmlOptions(nycXmlOptions)
+ .build();
+
Map<String, HttpApiConfig> configs = new HashMap<>();
configs.put("stock", stockConfig);
configs.put("sunrise", sunriseConfig);
configs.put("sunrise2", sunriseWithParamsConfig);
configs.put("pokemon", pokemonConfig);
+ configs.put("nyc", nycConfig);
HttpStoragePluginConfig mockStorageConfigWithWorkspace =
new HttpStoragePluginConfig(false, configs, 10, 1000, null, null, "",
80, "", "", "", null, PlainCredentialsProvider.EMPTY_CREDENTIALS_PROVIDER,
@@ -286,6 +301,26 @@ public class TestHttpPlugin extends ClusterTest {
.dataPath("results")
.build();
+ HttpXmlOptions xmlOptions = new HttpXmlOptions.HttpXmlOptionsBuilder()
+ .dataLevel(2)
+ .build();
+
+ TupleMetadata testSchema = new SchemaBuilder()
+ .add("attributes", MinorType.MAP)
+ .addNullable("COMMON", MinorType.VARCHAR)
+ .addNullable("BOTANICAL", MinorType.VARCHAR)
+ .addNullable("ZONE", MinorType.INT)
+ .addNullable("LIGHT", MinorType.VARCHAR)
+ .addNullable("PRICE", MinorType.VARCHAR)
+ .addNullable("AVAILABILITY", MinorType.VARCHAR)
+ .buildSchema();
+
+ HttpXmlOptions xmlOptionsWithSchhema = new
HttpXmlOptions.HttpXmlOptionsBuilder()
+ .dataLevel(2)
+ .schema(testSchema)
+ .build();
+
+
HttpApiConfig mockXmlConfig = HttpApiConfig.builder()
.url(makeUrl("http://localhost:%d/xml"))
.method("GET")
@@ -295,9 +330,22 @@ public class TestHttpPlugin extends ClusterTest {
.password("pass")
.dataPath("results")
.inputType("xml")
- .xmlDataLevel(2)
+ .xmlOptions(xmlOptions)
.build();
+ HttpApiConfig mockXmlConfigWithSchema = HttpApiConfig.builder()
+ .url(makeUrl("http://localhost:%d/xml"))
+ .method("GET")
+ .headers(headers)
+ .authType("basic")
+ .userName("user")
+ .password("pass")
+ .dataPath("results")
+ .inputType("xml")
+ .xmlOptions(xmlOptionsWithSchhema)
+ .build();
+
+
HttpApiConfig mockGithubWithParam = HttpApiConfig.builder()
.url(makeUrl("http://localhost:%d/orgs/{org}/repos"))
.method("GET")
@@ -349,6 +397,7 @@ public class TestHttpPlugin extends ClusterTest {
configs.put("mockPostPushdownWithStaticParams",
mockPostPushdownWithStaticParams);
configs.put("mockcsv", mockCsvConfig);
configs.put("mockxml", mockXmlConfig);
+ configs.put("mockxml_with_schema", mockXmlConfigWithSchema);
configs.put("github", mockGithubWithParam);
configs.put("github2", mockGithubWithDuplicateParam);
configs.put("github3", mockGithubWithParamInQuery);
@@ -385,6 +434,7 @@ public class TestHttpPlugin extends ClusterTest {
.addRow("local.mockcsv", "http")
.addRow("local.mockpost", "http")
.addRow("local.mockxml", "http")
+ .addRow("local.mockxml_with_schema", "http")
.addRow("local.nullpost", "http")
.addRow("local.sunrise", "http")
.build();
@@ -505,6 +555,35 @@ public class TestHttpPlugin extends ClusterTest {
doSimpleSpecificQuery(sql);
}
+ @Test
+ @Ignore("Requires Remote Server")
+ public void simpleStarQueryWithXMLParams() throws Exception {
+ String sql = "SELECT year, department, expense_category, budget_code,
budget_name, modified, adopted " +
+ "FROM live.nyc WHERE type_of_data='Budget' AND records_from=1 AND
max_records=5 AND year IS NOT null";
+
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .add("year", TypeProtos.MinorType.VARCHAR, TypeProtos.DataMode.OPTIONAL)
+ .add("department", TypeProtos.MinorType.VARCHAR,
TypeProtos.DataMode.OPTIONAL)
+ .add("expense_category", TypeProtos.MinorType.VARCHAR,
TypeProtos.DataMode.OPTIONAL)
+ .add("budget_code", TypeProtos.MinorType.VARCHAR,
TypeProtos.DataMode.OPTIONAL)
+ .add("budget_name", TypeProtos.MinorType.VARCHAR,
TypeProtos.DataMode.OPTIONAL)
+ .add("modified", TypeProtos.MinorType.VARCHAR,
TypeProtos.DataMode.OPTIONAL)
+ .add("adopted", TypeProtos.MinorType.VARCHAR,
TypeProtos.DataMode.OPTIONAL)
+ .build();
+
+ RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+ .addRow("2022", "MEDICAL ASSISTANCE - OTPS", "MEDICAL ASSISTANCE",
"9564", "MMIS MEDICAL ASSISTANCE", "5972433142", "5584533142")
+ .addRow("2020", "MEDICAL ASSISTANCE - OTPS", "MEDICAL ASSISTANCE",
"9564", "MMIS MEDICAL ASSISTANCE", "5819588142", "4953233142")
+ .addRow("2014", "MEDICAL ASSISTANCE - OTPS", "MEDICAL ASSISTANCE",
"9564", "MMIS MEDICAL ASSISTANCE", "5708101276", "5231324567")
+ .addRow("2015", "MEDICAL ASSISTANCE - OTPS", "MEDICAL ASSISTANCE",
"9564", "MMIS MEDICAL ASSISTANCE", "5663673673", "5312507361")
+ .build();
+
+ RowSetUtilities.verify(expected, results);
+ }
+
+
private void doSimpleSpecificQuery(String sql) throws Exception {
RowSet results = client.queryBuilder().sql(sql).rowSet();
@@ -758,6 +837,22 @@ public class TestHttpPlugin extends ClusterTest {
}
}
+ @Test
+ public void testSerDeXML() throws Exception {
+ try (MockWebServer server = startServer()) {
+
+ server.enqueue(
+ new MockResponse().setResponseCode(200)
+ .setBody(TEST_XML_RESPONSE)
+ );
+
+ String sql = "SELECT COUNT(*) FROM local.mockxml.`xml?arg1=4` ";
+ String plan = queryBuilder().sql(sql).explainJson();
+ long cnt = queryBuilder().physical(plan).singletonLong();
+ assertEquals("Counts should match", 36L, cnt);
+ }
+ }
+
@Test
public void testSerDeCSV() throws Exception {
try (MockWebServer server = startServer()) {
@@ -874,6 +969,37 @@ public class TestHttpPlugin extends ClusterTest {
}
}
+ @Test
+ public void testXmlWithSchemaResponse() throws Exception {
+ String sql = "SELECT * FROM local.mockxml_with_schema.`?arg1=4` LIMIT 5";
+ try (MockWebServer server = startServer()) {
+
+ server.enqueue(new
MockResponse().setResponseCode(200).setBody(TEST_XML_RESPONSE));
+
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .add("attributes", MinorType.MAP)
+ .addNullable("COMMON", MinorType.VARCHAR)
+ .addNullable("BOTANICAL", MinorType.VARCHAR)
+ .addNullable("ZONE", MinorType.INT)
+ .addNullable("LIGHT", MinorType.VARCHAR)
+ .addNullable("PRICE", MinorType.VARCHAR)
+ .addNullable("AVAILABILITY", MinorType.VARCHAR)
+ .buildSchema();
+
+ RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+ .addRow(mapArray(), "Bloodroot", "Sanguinaria canadensis", 4, "Mostly
Shady", "$2.44", "031599")
+ .addRow(mapArray(),"Columbine", "Aquilegia canadensis", 3, "Mostly
Shady", "$9.37", "030699")
+ .addRow(mapArray(),"Marsh Marigold", "Caltha palustris", 4, "Mostly
Sunny", "$6.81", "051799")
+ .addRow(mapArray(), "Cowslip", "Caltha palustris", 4, "Mostly Shady",
"$9.90", "030699")
+ .addRow(mapArray(), "Dutchman's-Breeches", "Dicentra cucullaria", 3,
"Mostly Shady", "$6.44", "012099")
+ .build();
+
+ RowSetUtilities.verify(expected, results);
+ }
+ }
+
@Test
public void testImplicitFieldsWithJSON() throws Exception {
String sql = "SELECT _response_code, _response_message,
_response_protocol, _response_url FROM
local.sunrise.`?lat=36.7201600&lng=-4.4203400&date=2019-10-02`";
diff --git
a/contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestPagination.java
b/contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestPagination.java
index 2334315a3e..5931e0d032 100644
---
a/contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestPagination.java
+++
b/contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestPagination.java
@@ -226,6 +226,10 @@ public class TestPagination extends ClusterTest {
List<String> params = new ArrayList<>();
params.add("foo");
+ HttpXmlOptions xmlOptions = HttpXmlOptions.builder()
+ .dataLevel(2)
+ .build();
+
HttpApiConfig mockXmlConfigWithPaginator = HttpApiConfig.builder()
.url("http://localhost:8092/xml")
.method("GET")
@@ -233,7 +237,7 @@ public class TestPagination extends ClusterTest {
.params(params)
.paginator(pagePaginatorForXML)
.inputType("xml")
- .xmlDataLevel(2)
+ .xmlOptions(xmlOptions)
.build();
HttpApiConfig mockXmlConfigWithPaginatorAndUrlParams =
HttpApiConfig.builder()
@@ -243,7 +247,7 @@ public class TestPagination extends ClusterTest {
.params(params)
.paginator(pagePaginatorForXML)
.inputType("xml")
- .xmlDataLevel(2)
+ .xmlOptions(xmlOptions)
.build();
diff --git a/contrib/storage-http/src/test/resources/data/response.xml
b/contrib/storage-http/src/test/resources/data/response.xml
index d9dc3f5c1e..6681266a51 100644
--- a/contrib/storage-http/src/test/resources/data/response.xml
+++ b/contrib/storage-http/src/test/resources/data/response.xml
@@ -197,7 +197,7 @@
<PLANT>
<COMMON>Black-Eyed Susan</COMMON>
<BOTANICAL>Rudbeckia hirta</BOTANICAL>
- <ZONE>Annual</ZONE>
+ <ZONE>8</ZONE>
<LIGHT>Sunny</LIGHT>
<PRICE>$9.80</PRICE>
<AVAILABILITY>061899</AVAILABILITY>
@@ -221,7 +221,7 @@
<PLANT>
<COMMON>Butterfly Weed</COMMON>
<BOTANICAL>Asclepias tuberosa</BOTANICAL>
- <ZONE>Annual</ZONE>
+ <ZONE>8</ZONE>
<LIGHT>Sunny</LIGHT>
<PRICE>$2.78</PRICE>
<AVAILABILITY>063099</AVAILABILITY>
@@ -229,7 +229,7 @@
<PLANT>
<COMMON>Cinquefoil</COMMON>
<BOTANICAL>Potentilla</BOTANICAL>
- <ZONE>Annual</ZONE>
+ <ZONE>8</ZONE>
<LIGHT>Shade</LIGHT>
<PRICE>$7.06</PRICE>
<AVAILABILITY>052599</AVAILABILITY>
@@ -237,7 +237,7 @@
<PLANT>
<COMMON>Primrose</COMMON>
<BOTANICAL>Oenothera</BOTANICAL>
- <ZONE>3 - 5</ZONE>
+ <ZONE>3</ZONE>
<LIGHT>Sunny</LIGHT>
<PRICE>$6.56</PRICE>
<AVAILABILITY>013099</AVAILABILITY>
@@ -261,7 +261,7 @@
<PLANT>
<COMMON>Jacob's Ladder</COMMON>
<BOTANICAL>Polemonium caeruleum</BOTANICAL>
- <ZONE>Annual</ZONE>
+ <ZONE>8</ZONE>
<LIGHT>Shade</LIGHT>
<PRICE>$9.26</PRICE>
<AVAILABILITY>022199</AVAILABILITY>
@@ -269,7 +269,7 @@
<PLANT>
<COMMON>Greek Valerian</COMMON>
<BOTANICAL>Polemonium caeruleum</BOTANICAL>
- <ZONE>Annual</ZONE>
+ <ZONE>8</ZONE>
<LIGHT>Shade</LIGHT>
<PRICE>$4.36</PRICE>
<AVAILABILITY>071499</AVAILABILITY>
@@ -277,7 +277,7 @@
<PLANT>
<COMMON>California Poppy</COMMON>
<BOTANICAL>Eschscholzia californica</BOTANICAL>
- <ZONE>Annual</ZONE>
+ <ZONE>8</ZONE>
<LIGHT>Sun</LIGHT>
<PRICE>$7.89</PRICE>
<AVAILABILITY>032799</AVAILABILITY>
@@ -285,7 +285,7 @@
<PLANT>
<COMMON>Shooting Star</COMMON>
<BOTANICAL>Dodecatheon</BOTANICAL>
- <ZONE>Annual</ZONE>
+ <ZONE>8</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$8.60</PRICE>
<AVAILABILITY>051399</AVAILABILITY>
@@ -293,7 +293,7 @@
<PLANT>
<COMMON>Snakeroot</COMMON>
<BOTANICAL>Cimicifuga</BOTANICAL>
- <ZONE>Annual</ZONE>
+ <ZONE>8</ZONE>
<LIGHT>Shade</LIGHT>
<PRICE>$5.63</PRICE>
<AVAILABILITY>071199</AVAILABILITY>
@@ -306,4 +306,4 @@
<PRICE>$3.02</PRICE>
<AVAILABILITY>022299</AVAILABILITY>
</PLANT>
-</CATALOG>
\ No newline at end of file
+</CATALOG>