[
https://issues.apache.org/jira/browse/DRILL-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382942#comment-17382942
]
ASF GitHub Bot commented on DRILL-7970:
---------------------------------------
paul-rogers commented on a change in pull request #2270:
URL: https://github.com/apache/drill/pull/2270#discussion_r671914448
##########
File path: contrib/storage-http/README.md
##########
@@ -45,6 +45,23 @@ The `connection` property can accept the following options.
`url`: The base URL which Drill will query.
+##### Parameters in the URL
+Many APIs require parameters to be passed directly in the URL instead of as
query arguments. For example, github's API allows you to query an
organization's repositories with the following
+URL: https://github.com/orgs/{org}/repos
+
+As of Drill 1.20.0, you can simply set the URL in the connection using the
curly braces. If your API includes URL parameters you MUST
+include them in the `WHERE` clause in your query, or Drill will throw an
error.
+
+As an example, the API above, you would have to query as shown below:
+
+```sql
+SELECT *
+FROM api.github
+WHERE org = 'apache'
+```
+
+This query would replace the `org`in the URL with the value from the `WHERE`
clause, in this case `apache`.
Review comment:
What data types are allowed? Can I say `account = 1234` or `verbose =
false`?
Some URLs allow optional tail arguments:
`https:foo.com/api/projects[/{id}]`. Can this be handled or are two distinct
plugins needed?
Often there is a default value: `foo.com` for the `{org}` in Github. Can the
default be set in the URL `{org=foo.com}` or in the plugin config?
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
+
+ /**
+ * APIs sometimes are structured with parameters in the URL itself. For
instance, to request a list of
+ * an organization's repositories in github, the URL is:
https://api.github.com/orgs/{org}/repos, where
+ * you can replace the org with the actual organization name.
+ *
+ * @return A list of URL parameters enclosed by curly braces.
+ */
+ public static List<String> getURLParameters(HttpUrl url) {
+ List<String> parameters = new ArrayList<>();
+ String decodedURL;
+ try {
+ decodedURL = URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return null;
+ }
Review comment:
There is a method to do this above: `decodedURL()`. Why not use that
here?
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
+
+ /**
+ * APIs sometimes are structured with parameters in the URL itself. For
instance, to request a list of
+ * an organization's repositories in github, the URL is:
https://api.github.com/orgs/{org}/repos, where
+ * you can replace the org with the actual organization name.
+ *
+ * @return A list of URL parameters enclosed by curly braces.
+ */
+ public static List<String> getURLParameters(HttpUrl url) {
+ List<String> parameters = new ArrayList<>();
+ String decodedURL;
+ try {
+ decodedURL = URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return null;
+ }
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedURL);
+ String param;
+ while (matcher.find()) {
+ param = matcher.group();
+ param = param.replace("{", "");
+ param = param.replace("}", "");
+ parameters.add(param);
Review comment:
This is pretty slow and clunky. Better pattern: `\{(\w+)\}`. The
*capturing group* has only the bits you want.
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
Review comment:
Nit: the above requires we parse the parameters twice: once to see if
any exist, another time to get them. Can the two steps be combined? Extract a
list of parameters? If empty, then there are none. If non-empty, then do the
substitution.
##########
File path:
contrib/storage-http/src/test/java/org/apache/drill/exec/store/http/TestHttpPlugin.java
##########
@@ -140,12 +141,17 @@ private static void makeMockConfig() {
HttpApiConfig mockXmlConfig = new
HttpApiConfig("http://localhost:8091/xml", "GET", headers,
"basic", "user", "pass", null, null, "results", null, "xml", 2,false);
+ HttpApiConfig mockGithubWithParam = new
HttpApiConfig("http://localhost:8091/orgs/{org}/repos", "GET", headers,
Review comment:
Please add tests for the various cases above, along with:
* Duplicate names
* Mixed-case names (param="org", column="ORG", etc.)
* Missing column name
* Column of the wrong type
* Parameter is in the query clause (something that should be tested anyway,
url: "...?foo=bar"
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpPushDownListener.java
##########
@@ -45,6 +47,7 @@
* <li>An AND'ed set of such expressions,</li>
* <li>If the value is one with an unambiguous conversion to
* a string. (That is, not dates, binary, maps, etc.)</li>
+ * <li>A field from the URL. For instance in some APIs you have parameters
that are part of the path.</li>
Review comment:
Nit: include your example.
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -57,6 +63,7 @@
*/
public class SimpleHttp {
private static final Logger logger =
LoggerFactory.getLogger(SimpleHttp.class);
+ private static final Pattern URL_PARAM_REGEX =
Pattern.compile("\\{([A-Za-z0-9_]+)\\}");
Review comment:
Simpler: `\\{\w+\\}`. `\w` [is
defined](https://www.oracle.com/technical-resources/articles/java/regex.html)
as `[a-zA-Z_0-9]`
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
+
+ /**
+ * APIs sometimes are structured with parameters in the URL itself. For
instance, to request a list of
+ * an organization's repositories in github, the URL is:
https://api.github.com/orgs/{org}/repos, where
+ * you can replace the org with the actual organization name.
+ *
+ * @return A list of URL parameters enclosed by curly braces.
+ */
+ public static List<String> getURLParameters(HttpUrl url) {
+ List<String> parameters = new ArrayList<>();
+ String decodedURL;
+ try {
+ decodedURL = URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return null;
+ }
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedURL);
+ String param;
+ while (matcher.find()) {
+ param = matcher.group();
Review comment:
Nit: `String param = ...` Since the variable is not used outside of the
loop scope.
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
+
+ /**
+ * APIs sometimes are structured with parameters in the URL itself. For
instance, to request a list of
+ * an organization's repositories in github, the URL is:
https://api.github.com/orgs/{org}/repos, where
+ * you can replace the org with the actual organization name.
+ *
+ * @return A list of URL parameters enclosed by curly braces.
+ */
+ public static List<String> getURLParameters(HttpUrl url) {
+ List<String> parameters = new ArrayList<>();
+ String decodedURL;
+ try {
+ decodedURL = URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return null;
+ }
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedURL);
+ String param;
+ while (matcher.find()) {
+ param = matcher.group();
+ param = param.replace("{", "");
+ param = param.replace("}", "");
+ parameters.add(param);
+ }
+ return parameters;
+ }
+
+ /**
+ * Used for APIs which have parameters in the URL. This function maps the
filters pushed down
+ * from the query into the URL. For example the API:
github.com/orgs/{org}/repos requires a user to
+ * specify an organization and replace {org} with an actual organization.
The filter is passed down from
+ * the query.
+ *
+ * Note that if a URL contains URL parameters and one is not provided in the
filters, Drill will throw
+ * a UserException.
+ *
+ * @param url The HttpUrl containing URL Parameters
+ * @param filters: A HashMap of filters
+ * @return A string of the URL with the URL parameters replaced by filter
values
+ */
+ public static String mapURLParameters(HttpUrl url, Map<String, String>
filters) {
+ if (! hasURLParameters(url)) {
+ return url.toString();
+ }
+
+ List<String> params = SimpleHttp.getURLParameters(url);
+ String tempUrl = SimpleHttp.decodedURL(url);
+ for (String param : params) {
+ if (filters == null) {
+ throw UserException
+ .parseError()
+ .message("API Query with URL Parameters must be populated. Parameter
" + param + " must be included in WHERE clause.")
+ .build(logger);
+ }
+
+ String value = filters.get(param);
Review comment:
SQL is case insensitive. Is the `filters` a case insensitive hash map?
Or, should we convert all keys to a lower case?
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
+
+ /**
+ * APIs sometimes are structured with parameters in the URL itself. For
instance, to request a list of
+ * an organization's repositories in github, the URL is:
https://api.github.com/orgs/{org}/repos, where
+ * you can replace the org with the actual organization name.
+ *
+ * @return A list of URL parameters enclosed by curly braces.
+ */
+ public static List<String> getURLParameters(HttpUrl url) {
+ List<String> parameters = new ArrayList<>();
+ String decodedURL;
+ try {
+ decodedURL = URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return null;
+ }
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedURL);
+ String param;
+ while (matcher.find()) {
+ param = matcher.group();
+ param = param.replace("{", "");
+ param = param.replace("}", "");
+ parameters.add(param);
+ }
+ return parameters;
+ }
+
+ /**
+ * Used for APIs which have parameters in the URL. This function maps the
filters pushed down
+ * from the query into the URL. For example the API:
github.com/orgs/{org}/repos requires a user to
+ * specify an organization and replace {org} with an actual organization.
The filter is passed down from
+ * the query.
+ *
+ * Note that if a URL contains URL parameters and one is not provided in the
filters, Drill will throw
+ * a UserException.
+ *
+ * @param url The HttpUrl containing URL Parameters
+ * @param filters: A HashMap of filters
+ * @return A string of the URL with the URL parameters replaced by filter
values
+ */
+ public static String mapURLParameters(HttpUrl url, Map<String, String>
filters) {
+ if (! hasURLParameters(url)) {
+ return url.toString();
+ }
+
+ List<String> params = SimpleHttp.getURLParameters(url);
+ String tempUrl = SimpleHttp.decodedURL(url);
+ for (String param : params) {
+ if (filters == null) {
+ throw UserException
+ .parseError()
+ .message("API Query with URL Parameters must be populated. Parameter
" + param + " must be included in WHERE clause.")
+ .build(logger);
+ }
+
+ String value = filters.get(param);
+ // If the param is not populated, throw an exception
+ if (Strings.isNullOrEmpty(value)) {
+ throw UserException
+ .parseError()
+ .message("API Query with URL Parameters must be populated. Parameter
" + param + " must be included in WHERE clause.")
+ .build(logger);
+ } else {
+ tempUrl = tempUrl.replace("/{" + param + "}", "/" + value);
Review comment:
Three comments:
1. The above only replaces the first instance. For
`.../{ORG}/projects/{ORG}` there will be two entries, so this might be OK.
2. The `param` is of the from the URL, so cases match. But, if we mapped
names to lower case to match the case-insensitive column names, we have to
remember to use the original case here.
3. The substitution is unstable. If I have the pathological
`.../{foo}/{bar}` and in my query I have `foo='{bar}', then the result will be
wrong.
4. It turns out that the regex find method can tell us the offset of the
matched pattern. It would be better (and would solve the above issues) if we
converted the input URL into a list of prefix/pattern values in which the
pattern has the start offset, length, name and optional default. The prefix is
the part of the URL between patterns. There may be a prefix-only last segment,
or some segments with no prefixes. Examples, noting that we do nothing to
police where the parameters are placed in the URL:
```text
{url} -->
[{prefix="", start=0, length=5, name="url"}]
{method}://foo.com/latest -->
[prefix="", start=0, length=8, name="method"},
{prefix="://foo.com/latest", start=-1, length=-1, name=null}]
https://foo.com/api/{op} -->
[{prefix="https://foo.com/api/", start=xx, length=xx, name="op"}]
https://foo.com/api/?op={op} -->
[{prefix="https://foo.com/api/?op=", start=xx, length=xx, name="op"}]
https://foo.com/api/{org}/projects/{org}/details -->
[{prefix="https://foo.com/api/", start=xx, length=xx, name="org"},
{prefix="/projects/", start=xx, length=xx, name="org"},
{prefix="/details", start=-1, length=-1, name=null}]
```
We do have tests for all of these?
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
+
+ /**
+ * APIs sometimes are structured with parameters in the URL itself. For
instance, to request a list of
+ * an organization's repositories in github, the URL is:
https://api.github.com/orgs/{org}/repos, where
+ * you can replace the org with the actual organization name.
+ *
+ * @return A list of URL parameters enclosed by curly braces.
+ */
+ public static List<String> getURLParameters(HttpUrl url) {
+ List<String> parameters = new ArrayList<>();
+ String decodedURL;
+ try {
+ decodedURL = URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return null;
+ }
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedURL);
+ String param;
+ while (matcher.find()) {
+ param = matcher.group();
+ param = param.replace("{", "");
+ param = param.replace("}", "");
+ parameters.add(param);
+ }
+ return parameters;
+ }
+
+ /**
+ * Used for APIs which have parameters in the URL. This function maps the
filters pushed down
+ * from the query into the URL. For example the API:
github.com/orgs/{org}/repos requires a user to
+ * specify an organization and replace {org} with an actual organization.
The filter is passed down from
+ * the query.
+ *
+ * Note that if a URL contains URL parameters and one is not provided in the
filters, Drill will throw
+ * a UserException.
+ *
+ * @param url The HttpUrl containing URL Parameters
+ * @param filters: A HashMap of filters
+ * @return A string of the URL with the URL parameters replaced by filter
values
+ */
+ public static String mapURLParameters(HttpUrl url, Map<String, String>
filters) {
+ if (! hasURLParameters(url)) {
+ return url.toString();
+ }
+
+ List<String> params = SimpleHttp.getURLParameters(url);
+ String tempUrl = SimpleHttp.decodedURL(url);
+ for (String param : params) {
+ if (filters == null) {
+ throw UserException
+ .parseError()
+ .message("API Query with URL Parameters must be populated. Parameter
" + param + " must be included in WHERE clause.")
+ .build(logger);
+ }
+
+ String value = filters.get(param);
+ // If the param is not populated, throw an exception
+ if (Strings.isNullOrEmpty(value)) {
Review comment:
A Java `null` presumably means the value is missing. That would be a
great place to handle defaults. An empty string might be OK:
`https://acertaincompany.com/projects{PROJ}` where the parameter is either
`""` or `"/1234"`.
Also, there is nothing stopping me from doing `../foo?bar={bar}` and a blank
value might be OK. Note that this "feature" means that we've got to parse the
URL to see if the user included any query params before we add any of ours so
we get `?bar=xxx&ours=10` and not `?bar=xxx?ours=10`.
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
+
+ /**
+ * APIs sometimes are structured with parameters in the URL itself. For
instance, to request a list of
+ * an organization's repositories in github, the URL is:
https://api.github.com/orgs/{org}/repos, where
+ * you can replace the org with the actual organization name.
+ *
+ * @return A list of URL parameters enclosed by curly braces.
+ */
+ public static List<String> getURLParameters(HttpUrl url) {
+ List<String> parameters = new ArrayList<>();
+ String decodedURL;
+ try {
+ decodedURL = URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return null;
+ }
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedURL);
+ String param;
+ while (matcher.find()) {
+ param = matcher.group();
+ param = param.replace("{", "");
+ param = param.replace("}", "");
+ parameters.add(param);
+ }
+ return parameters;
+ }
+
+ /**
+ * Used for APIs which have parameters in the URL. This function maps the
filters pushed down
+ * from the query into the URL. For example the API:
github.com/orgs/{org}/repos requires a user to
+ * specify an organization and replace {org} with an actual organization.
The filter is passed down from
+ * the query.
+ *
+ * Note that if a URL contains URL parameters and one is not provided in the
filters, Drill will throw
+ * a UserException.
+ *
+ * @param url The HttpUrl containing URL Parameters
+ * @param filters: A HashMap of filters
+ * @return A string of the URL with the URL parameters replaced by filter
values
+ */
+ public static String mapURLParameters(HttpUrl url, Map<String, String>
filters) {
+ if (! hasURLParameters(url)) {
+ return url.toString();
+ }
+
+ List<String> params = SimpleHttp.getURLParameters(url);
+ String tempUrl = SimpleHttp.decodedURL(url);
+ for (String param : params) {
+ if (filters == null) {
Review comment:
We're looping per parameter. An empty filter list should have been
checked earlier. But it is a special case of missing parameter.
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
##########
@@ -322,6 +336,99 @@ private void setupCache(Builder builder) {
return formBodyBuilder;
}
+ /**
+ * Returns the URL-decoded URL
+ *
+ * @return Returns the URL-decoded URL
+ */
+ public static String decodedURL(HttpUrl url) {
+ try {
+ return URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return url.toString();
+ }
+ }
+
+ /**
+ * Returns true if the url has url parameters, as indicated by the presence
of
+ * {param} in a url.
+ *
+ * @return True if there are URL params, false if not
+ */
+ public static boolean hasURLParameters(HttpUrl url) {
+ String decodedUrl = SimpleHttp.decodedURL(url);
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedUrl);
+ return matcher.find();
+ }
+
+ /**
+ * APIs sometimes are structured with parameters in the URL itself. For
instance, to request a list of
+ * an organization's repositories in github, the URL is:
https://api.github.com/orgs/{org}/repos, where
+ * you can replace the org with the actual organization name.
+ *
+ * @return A list of URL parameters enclosed by curly braces.
+ */
+ public static List<String> getURLParameters(HttpUrl url) {
+ List<String> parameters = new ArrayList<>();
+ String decodedURL;
+ try {
+ decodedURL = URLDecoder.decode(url.toString(), "UTF-8");
+ } catch (UnsupportedEncodingException e) {
+ return null;
+ }
+ Matcher matcher = URL_PARAM_REGEX.matcher(decodedURL);
+ String param;
+ while (matcher.find()) {
+ param = matcher.group();
+ param = param.replace("{", "");
+ param = param.replace("}", "");
+ parameters.add(param);
+ }
+ return parameters;
+ }
+
+ /**
+ * Used for APIs which have parameters in the URL. This function maps the
filters pushed down
+ * from the query into the URL. For example the API:
github.com/orgs/{org}/repos requires a user to
+ * specify an organization and replace {org} with an actual organization.
The filter is passed down from
+ * the query.
+ *
+ * Note that if a URL contains URL parameters and one is not provided in the
filters, Drill will throw
+ * a UserException.
+ *
+ * @param url The HttpUrl containing URL Parameters
+ * @param filters: A HashMap of filters
+ * @return A string of the URL with the URL parameters replaced by filter
values
+ */
+ public static String mapURLParameters(HttpUrl url, Map<String, String>
filters) {
+ if (! hasURLParameters(url)) {
+ return url.toString();
+ }
+
+ List<String> params = SimpleHttp.getURLParameters(url);
Review comment:
This design seems to allow `.../{org}/projects/{org}/...`, that is, two
occurrences of the same parameter. That is, we have a list of parameter names,
not a map. This is probably a feature, not a bug...
##########
File path:
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpPushDownListener.java
##########
@@ -82,9 +85,20 @@ public ScanPushDownListener builderFor(GroupScan groupScan) {
HttpScanPushDownListener(HttpGroupScan groupScan) {
this.groupScan = groupScan;
+ // Add fields from config
for (String field : groupScan.getHttpConfig().params()) {
filterParams.put(field, field);
}
+
+ // Add fields from the URL path as denoted by {}
+ HttpUrl url = HttpUrl.parse(groupScan.getHttpConfig().url());
+ if (url != null) {
+ List<String> urlParams = SimpleHttp.getURLParameters(url);
+ assert urlParams != null;
Review comment:
Assertions are OK if they indicate a code path which should never occur.
An assertion tells us that the *code* is wrong.
However, here, the outcome depends on what the user provided in the URL,
correct? So, this should be a `UserError`, not an assertion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Add URL Parameters to HTTP Plugin
> ---------------------------------
>
> Key: DRILL-7970
> URL: https://issues.apache.org/jira/browse/DRILL-7970
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Other
> Affects Versions: 1.19.0
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.20.0
>
>
> Many APIs require the user to pass parameters as part of the URL path. This
> PR adds support for such a capability.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)