Hi Paul,
See comments inline.
> On Jan 8, 2020, at 5:25 PM, Paul Rogers <[email protected]> wrote:
>
> Hi Charles,
>
>
> Makes sense. Perhaps there is a middle ground. Perhaps the user creates a
> different plugin config for each API "group". That is, one for Google,
> another for Facebook, another for CatVideoTube. The config would include
> things like security tokens, REST format and so on.
>
>
> Then, within that, use your idea to have a map of endpoints to table names.
> That is "/people/who/liv/in/washington/json" for Facebook gets mapped to a
> table called "DCUsers".
>
I think we can do this as the plugin is currently implemented. Please take a
look at the config below and see if you agree. It may be redundant, but I was
thinking of adding an 'args' option in the connections to separate the URL from
query params. I say it's redundant because you could just make that the URL.
{
"type": "http",
"cacheResults": false,
"timeout" 5,
"connections": {
"sunrise": {
"url": "https://api.sunrise-sunset.org/",
"method": "get",
"headers": null,
"authType": "none",
"userName": null,
"password": null,
"postBody": null
},
"jira": {
"url": "https://<project>.atlassian.net/rest/api/3/",
"method": "get",
"headers": {
"Accept": "application/json"
},
"authType": "basic",
"userName": "<username>",
"password": "<API Key>",
"postBody": null
}
},
"enabled": true
}
If this instance of the plugin was called "google" it could be populated with a
collection of google APIs and the user could create another called "CatTube"
with CatTube APIs etc.
One thing to note, is that this plugin does validate the configs. I don't
remember off the top of my head, but there are things like you can't specify a
postBody if you are making a GET request.
>
> Point is, these configs are all for the same service; with multiple plugin
> configs for different services.
>
>
> We will still need to map SQL predicates to JSON query strings. This is a big
> mess at present (Each plugin copies the same wad of code from one to the
> next, sometimes with the same bugs.) So makes sense to tackle that part step
> by step.
Are you referring to filter pushdown here? At the moment, I completely removed
the legacy code from this plugin in anticipation of the Base Storage PR being
committed. Once that's done, I can start working on that. The original
implementation of this plugin allowed the user to specify filters with a
leading $. These fields would be transmitted in the URL So for example:
SELECT <fields> FROM api... WHERE $field_1= 20
Would map to
http://someapi <http://someapi/>.com/?field_1=20
I removed this code as i thought the same thing could be accomplished either in
the query or somewhere. It did seem like a good idea however.
>
>
> We can note that each endpoint will need different parameters, so they may
> want to be part of your endpoint config. That is the "users" table might take
> a "city" parameter, while a "posts" endpoint might take "topic", "start date"
> and "end date" parameters.
What I was thinking was that we could add some default query params in the
config.
>
>
> The great thing about the REST "standard" is that everyone has their own.
> You'll probably get lots of feature requests as folks encounter all the odd
> things that have been done with REST. No need to boil the ocean; these can be
> tackled as they crop up.
>
>
> I like your idea of using Swagger definitions. At least that imposes some
> sanity on REST. (But, of course, someone will REALLY need to get a a
> non-Swagger API.)
I think there are a few "standards" out there, swagger being one of them, that
allows REST owners to define a schema. It would be good for Drill to take
advantage of this, but that is in the future. One advantage of this is that it
would enable the plugin to actually generate useful information for the
INFORMATION_SCHEMA.
>
>
> Another thing that would be handy would be sharding: the ability to take a
> big query (all those Washington DC users) and split them into smaller queries
> (DC users A-G, H-M, N-T, U-Z). Then, the big REST query could be done in
> parallel as a series of smaller queries. Works especially well for things
> like time series data (which is the case I had to handle recently.)
Hmm.. that is interesting. I really like that idea.
>
>
> Thanks,
>
> - Paul
>
>
>
> On Wednesday, January 8, 2020, 01:32:20 PM PST, Charles Givre
> <[email protected]> wrote:
>
>
> Hey Paul,
> Thanks for the review on the plugin. I figured I send a response here on the
> dev alias so as to not spam everyone with github responses.
>
> Design:
> I did a lot of experimenting with this. The original implementation I found
> set things up such that a user could specify an endpoint in the config and
> append args in the query. For instance, you might have an endpoint called
> facebook which pinged some API at facebook. The query args were the table,
> and thus the queries looked like:
>
> SELECT *
> FROM facebook.`?field_1=foo&field_2=bar`
>
> You could append as much as you wanted so you could theoretically have
> something like:
>
> SELECT *
> FROM facebook.`/people/who/live/in/washington/json?field_1=foo&field_2=bar`
>
> That "table" name is appended to the URL specified in the plugin config to
> make the actual request.
>
> However, after using it for a while, I wasn't loving the implementation and
> thought it would be better to have sub configs, similar to the idea of
> workspaces in the dfs plugin. Thus, you can now create a plugin called
> 'googleapis' and within that, have different APIs. For example:
>
> SELECT *
> FROM googleapis.googlesheets.`?param1=foo¶m2=bar`
>
> and
>
> SELECT *
> FROM googleapis.googledoc.`?param1=foo`
>
> This seemed a lot more usable than the original implementation and would
> prevent a proliferation of copies of the plugin if you were querying a bunch
> of different APIs.
>
> I should add that as currently implemented, the API plugin can do POST
> queries and the user can add params to the POST body.
>
>
> Future Functionality:
> I intended this plugin really to be an MVP and perhaps something which could
> be extended for other systems that use HTTP/REST as an interface. If this
> gets used, and I do think it will, I plan on adding:
>
> Support for OAUTH2
> Filter pushdown (once the Base framework is committed)
> Schemas from swagger and/or OpenAPI
>
> Does this make sense?
>
>