Repository: incubator-gobblin Updated Branches: refs/heads/master 249fe6234 -> 1912eb790
[GOBBLIN-482] Add http write documentation [GOBBLIN-482] Add http write documentation Fix failed test Address comments from abti Closes #2352 from zxcware/hdoc Project: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/commit/1912eb79 Tree: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/tree/1912eb79 Diff: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/diff/1912eb79 Branch: refs/heads/master Commit: 1912eb7903f526fd2712052d3b628383697d124f Parents: 249fe62 Author: zhchen <[email protected]> Authored: Fri May 4 11:23:37 2018 -0700 Committer: Abhishek Tiwari <[email protected]> Committed: Fri May 4 11:23:37 2018 -0700 ---------------------------------------------------------------------- gobblin-docs/img/Http-Write.png | Bin 0 -> 39439 bytes gobblin-docs/sinks/Http.md | 108 +++++++++++++++++++ gobblin-docs/sinks/REST-Writer.md | 15 --- .../gobblin/writer/AsyncHttpWriterBuilder.java | 14 ++- .../gobblin/writer/AsyncHttpWriterTest.java | 1 + mkdocs.yml | 2 +- 6 files changed, 122 insertions(+), 18 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/1912eb79/gobblin-docs/img/Http-Write.png ---------------------------------------------------------------------- diff --git a/gobblin-docs/img/Http-Write.png b/gobblin-docs/img/Http-Write.png new file mode 100644 index 0000000..fde345b Binary files /dev/null and b/gobblin-docs/img/Http-Write.png differ http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/1912eb79/gobblin-docs/sinks/Http.md ---------------------------------------------------------------------- diff --git a/gobblin-docs/sinks/Http.md b/gobblin-docs/sinks/Http.md new file mode 100644 index 0000000..e5c3dbc --- /dev/null +++ b/gobblin-docs/sinks/Http.md @@ -0,0 +1,108 @@ +Table of Contents +------------------ +[TOC] + +# Introduction + +Writing to a http based sink is done by sending a http or restful request and handling the response. Given +the endpoint uri, query parameters, and body, it is straightforward to construct a http request. The idea +is to build a writer that writes a http record, which contains those elements of a request. The writer +builds a http or rest request from multiple http records, sends the request with a client that knows the server, +and handles the response. + +## Note +The old http write framework under [`AbstractHttpWriter`](https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriter.java) +and [`AbstractHttpWriterBuilder`](https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/writer/http/AbstractHttpWriterBuilder.java) +is deprecated (Deprecation date: 05/15/2018)! Use `AsyncHttpWriter` and `AsyncHttpWriterBuilder` instead + +# Constructs +<p align="center"> + <img src=../../img/Http-Write.png> +</p> +<p style="text-align: center;"> Figure 1. Http write flow </p> + +## `HttpOperation` +A http record is represented as a `HttpOperation` object. It has 4 fields. + +| Field Name | Description | Example +|---|---|---| +| `keys` | Optional, a key/value map to interpolate the url template | ```{"memberId": "123"}``` | +| `queryParams` | Optional, a map from query parameter to its value| ```{"action": "update"}``` | +| `headers` | Optional, a map from header key to ts value | ```{"version": "2.0"}``` | +| `body` | Optional, the request body in string or json string format | ```"{\"email\": \"[email protected]\"}"``` | + +Given an url template, ```http://www.test.com/profiles/${memberId}```, from job configuration, the resolved +example request url with `keys` and `queryParams` information will be ```http://www.test.com/profiles/123?action=update```. + +## `AsyncRequestBuilder` +An `AsyncRequestBuilder` builds an `AsyncRequest` from a collection of `HttpOperation` records. It could build one +request per record or batch multiple records into a single request. A builder is also responsible for +putting the `headers` and setting the `body` to the request. + +## `HttpClient` +A `HttpClient` sends a request and returns a response. If necessary, it should setup the connection to the server, for +example, sending an authorization request to get access token. How authorization is done is per use case. Gobblin does +not provide general support for authorization yet. + +## `ResponseHandler` +A `ResponseHandler` handles a response of a request. It returns a `ResponseStatus` object to the framework, which +would resend the request if it's a `SERVER_ERROR`. + +# Build an asynchronous writer +`AsyncHttpWriterBuilder` is the base builder to build an asynchronous http writer. A specific writer can be created by +providing the 3 major components: a `HttpClient`, a `AsyncRequestBuilder`, and a `ResponseHandler`. + +Gobblin offers 2 implementations of async +http writers. As long as your write requirement can be expressed as a `HttpOperation` through a `Converter`, the +2 implementations should work with configurations. + +## `AvroHttpWriterBuilder` +An `AvroHttpWriterBuilder` builds an `AsyncHttpWriter` on top of the [apache httpcomponents framework](https://hc.apache.org/), sending vanilla http request. +The 3 major components are: + + - `ApacheHttpClient`. It uses [`CloseableHttpClient`](https://github.com/apache/httpcomponents-client/blob/master/httpclient5/src/main/java/org/apache/hc/client5/http/impl/classic/CloseableHttpClient.java) to + send [`HttpUriRequest`](https://github.com/apache/httpcomponents-client/blob/master/httpclient5/src/main/java/org/apache/hc/client5/http/classic/methods/HttpUriRequest.java) + and receive [`CloseableHttpResponse`](https://github.com/apache/httpcomponents-client/blob/master/httpclient5/src/main/java/org/apache/hc/client5/http/impl/classic/CloseableHttpResponse.java) + - `ApacheHttpRequestBuilder`. It builds a `ApacheHttpRequest`, which is an `AsyncRequest` that wraps the `HttpUriRequest`, from one `HttpOperation` + - `ApacheHttpResponseHandler`. It handles a `HttpResponse` + +Configurations for the builder are: + +| Configuration | Description | Example +|---|---|---| +| `gobblin.writer.http.urlTemplate` | Required, the url template(schema and port included), together with `keys` and `queryParams`, to be resolved to request url | ```http://www.test.com/profiles/${memberId}``` | +| `gobblin.writer.http.verb` | Required, [http verbs](http://www.restapitutorial.com/lessons/httpmethods.html) | get, update, delete, etc | +| `gobblin.writer.http.maxAttempts` | Optional, max number of attempts including initial send | Default is 3 | +| `gobblin.writer.http.contentType` | Optional, content type of the request body | ```"application/json"```, which is the default value | + +## `R2RestWriterBuilder` +A `R2RestWriterBuilder` builds an `AsyncHttpWriter` on top of [restli r2 framework](https://github.com/linkedin/rest.li/wiki/Request---Response-API-(R2)), sending +rest request. The 3 major components are: + + - `R2Client`. It uses a R2 [`Client`](https://github.com/linkedin/rest.li/blob/master/r2-core/src/main/java/com/linkedin/r2/transport/common/Client.java) to + send [`RestRequest`](https://github.com/linkedin/rest.li/blob/master/r2-core/src/main/java/com/linkedin/r2/message/rest/RestRequest.java) and + receive [`RestResponse`](https://github.com/linkedin/rest.li/blob/master/r2-core/src/main/java/com/linkedin/r2/message/rest/RestResponse.java) + - `R2RestRequestBuilder`. It builds a `R2Request`, which is an `AsyncRequest` that wraps the `RestRequest`, from one `HttpOperation` + - `R2RestResponseHandler`. It handles a `RestResponse` + + `R2RestWriterBuilder` has [d2](https://github.com/linkedin/rest.li/wiki/Dynamic-Discovery) and ssl support. Configurations(`(d2.)` part should be added in d2 mode) for the builder are: + + | Configuration | Description | Example + |---|---|---| + | `gobblin.writer.http.urlTemplate` | Required, the url template(schema and port included), together with `keys` and `queryParams`, to be resolved to request url. If the schema is `d2`, d2 is enabled | ```http://www.test.com/profiles/${memberId}``` | + | `gobblin.writer.http.verb` | Required, [rest(rest.li) verbs](https://github.com/linkedin/rest.li/wiki/Rest.li-User-Guide#resource-methods) | get, update, put, delete, etc | + | `gobblin.writer.http.maxAttempts` | Optional, max number of attempts including initial send | Default is 3 | + | `gobblin.writer.http.d2.zkHosts`| Required for d2, the zookeeper address | | + | `gobblin.writer.http.(d2.)ssl`| Optional, enable ssl | Default is false | + | `gobblin.writer.http.(d2.)keyStoreFilePath`| Required for ssl | /tmp/identity.p12 | + | `gobblin.writer.http.(d2.)keyStoreType`| Required for ssl | PKCS12 | + | `gobblin.writer.http.(d2.)keyStorePassword`| Required for ssl | | + | `gobblin.writer.http.(d2.)trustStoreFilePath`| Required for ssl | | + | `gobblin.writer.http.(d2.)trustStorePassword`| Required for ssl | | + | `gobblin.writer.http.protocolVersion` | Optional, protocol version of rest.li | ```2.0.0```, which is the default value | + +`R2RestWriterBuilder` isn't ingegrated with `PasswordManager` to process encrypted passwords yet. The task is tracked as https://issues.apache.org/jira/browse/GOBBLIN-487 + +# Build a synchronous writer +The idea is to reuse an asynchronous writer to build its synchronous version. The technical difference between them +is the size of outstanding writes. Set `gobblin.writer.http.maxOutstandingWrites` to be `1`(default value is `1000`) to make a synchronous writer http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/1912eb79/gobblin-docs/sinks/REST-Writer.md ---------------------------------------------------------------------- diff --git a/gobblin-docs/sinks/REST-Writer.md b/gobblin-docs/sinks/REST-Writer.md deleted file mode 100644 index 9d13a72..0000000 --- a/gobblin-docs/sinks/REST-Writer.md +++ /dev/null @@ -1,15 +0,0 @@ - -# Description - - -TODO [Issue #1545](https://github.com/linkedin/gobblin/issues/1545) - - -# Usage - - -See the following classes under `gobblin.writer.http` for examples: - -- [`AbstractHttpWriter`](https://github.com/linkedin/gobblin/search?utf8=%E2%9C%93&q=AbstractHttpWriter) -- [`RestJsonWriter`](https://github.com/linkedin/gobblin/search?utf8=%E2%9C%93&q=RestJsonWriter) -- [`SalesforceRestWriter`](https://github.com/linkedin/gobblin/search?utf8=%E2%9C%93&q=SalesforceRestWriter) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/1912eb79/gobblin-modules/gobblin-http/src/main/java/org/apache/gobblin/writer/AsyncHttpWriterBuilder.java ---------------------------------------------------------------------- diff --git a/gobblin-modules/gobblin-http/src/main/java/org/apache/gobblin/writer/AsyncHttpWriterBuilder.java b/gobblin-modules/gobblin-http/src/main/java/org/apache/gobblin/writer/AsyncHttpWriterBuilder.java index 9db93d1..acd3d31 100644 --- a/gobblin-modules/gobblin-http/src/main/java/org/apache/gobblin/writer/AsyncHttpWriterBuilder.java +++ b/gobblin-modules/gobblin-http/src/main/java/org/apache/gobblin/writer/AsyncHttpWriterBuilder.java @@ -48,9 +48,15 @@ import lombok.extern.slf4j.Slf4j; @Slf4j public abstract class AsyncHttpWriterBuilder<D, RQ, RP> extends FluentDataWriterBuilder<Void, D, AsyncHttpWriterBuilder<D, RQ, RP>> { public static final String CONF_PREFIX = "gobblin.writer.http."; + + private static final String MAX_OUTSTANDING_WRITES = "maxOutstandingWrites"; + private static final String MAX_ATTEMPTS = "maxAttempts"; + private static final Config FALLBACK = ConfigFactory.parseMap(ImmutableMap.<String, Object>builder() .put(HttpConstants.ERROR_CODE_WHITELIST, "") + .put(MAX_OUTSTANDING_WRITES, AsyncWriterManager.MAX_OUTSTANDING_WRITES_DEFAULT) + .put(MAX_ATTEMPTS, AsyncHttpWriter.DEFAULT_MAX_ATTEMPTS) .build()); @Getter @@ -66,9 +72,10 @@ public abstract class AsyncHttpWriterBuilder<D, RQ, RP> extends FluentDataWriter @Getter protected int queueCapacity = AbstractAsyncDataWriter.DEFAULT_BUFFER_CAPACITY; @Getter - protected int maxAttempts = AsyncHttpWriter.DEFAULT_MAX_ATTEMPTS; - @Getter protected SharedResourcesBroker<GobblinScopeTypes> broker = null; + @Getter + protected int maxAttempts; + private int maxOutstandingWrites; /** * For backward compatibility on how Fork creates writer, invoke fromState when it's called writeTo method. @@ -91,6 +98,8 @@ public abstract class AsyncHttpWriterBuilder<D, RQ, RP> extends FluentDataWriter this.broker = this.state.getTaskBroker(); Config config = ConfigBuilder.create().loadProps(state.getProperties(), CONF_PREFIX).build(); config = config.withFallback(FALLBACK); + this.maxOutstandingWrites = config.getInt(MAX_OUTSTANDING_WRITES); + this.maxAttempts = config.getInt(MAX_ATTEMPTS); return fromConfig(config); } @@ -112,6 +121,7 @@ public abstract class AsyncHttpWriterBuilder<D, RQ, RP> extends FluentDataWriter return AsyncWriterManager.builder() .config(ConfigUtils.propertiesToConfig(getState().getProperties())) .asyncDataWriter(new AsyncHttpWriter(this)) + .maxOutstandingWrites(maxOutstandingWrites) .retriesEnabled(false) // retries are done in HttpBatchDispatcher .commitTimeoutMillis(10000L) .failureAllowanceRatio(0).build(); http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/1912eb79/gobblin-modules/gobblin-http/src/test/java/org/apache/gobblin/writer/AsyncHttpWriterTest.java ---------------------------------------------------------------------- diff --git a/gobblin-modules/gobblin-http/src/test/java/org/apache/gobblin/writer/AsyncHttpWriterTest.java b/gobblin-modules/gobblin-http/src/test/java/org/apache/gobblin/writer/AsyncHttpWriterTest.java index 127f41c..b8e7826 100644 --- a/gobblin-modules/gobblin-http/src/test/java/org/apache/gobblin/writer/AsyncHttpWriterTest.java +++ b/gobblin-modules/gobblin-http/src/test/java/org/apache/gobblin/writer/AsyncHttpWriterTest.java @@ -330,6 +330,7 @@ public class AsyncHttpWriterTest { this.responseHandler = responseHandler; this.state = new WorkUnitState(); this.queueCapacity = 2; + this.maxAttempts = 3; } @Override http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/1912eb79/mkdocs.yml ---------------------------------------------------------------------- diff --git a/mkdocs.yml b/mkdocs.yml index 7152bd0..b6a77f9 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -68,7 +68,7 @@ pages: - HDFS Byte array: sinks/SimpleBytesWriter.md - Console: sinks/ConsoleWriter.md - Couchbase: sinks/Couchbase-Writer.md - - HTTP and REST: sinks/REST-Writer.md + - HTTP: sinks/Http.md - JDBC: sinks/Gobblin-JDBC-Writer.md - Kafka: sinks/Kafka.md - Gobblin Adaptors:
