hubcio commented on code in PR #41: URL: https://github.com/apache/iggy-website/pull/41#discussion_r3180779548
########## content/docs/connectors/introduction.mdx: ########## @@ -27,7 +27,7 @@ The connector runtime operates in two directions - **source** (ingest) and **sin | Type | Connectors | |--------|----------------------------------------------------------------| | Source | PostgreSQL, Elasticsearch, Random | -| Sink | PostgreSQL, MongoDB, Elasticsearch, Quickwit, Apache Iceberg, Stdout | +| Sink | PostgreSQL, MongoDB, Elasticsearch, Quickwit, Apache Iceberg, Generic HTTP, Stdout | Review Comment: the table cell here says "Generic HTTP" but the page it links to has frontmatter title "HTTP Sink" and never uses the word "Generic" anywhere in the body. clicking through gives the reader two different names for the same thing. either rename the new page's frontmatter to something like "HTTP Sink (Generic)" and add a one-liner in the opening paragraph noting it's the transport-level connector (vs Elasticsearch/Quickwit which speak HTTP under the hood), or drop "Generic" from this row. ########## content/docs/connectors/sinks/http.mdx: ########## @@ -0,0 +1,179 @@ +--- +title: HTTP Sink +--- + +<Callout type="info"> + This page is a curated subset of the HTTP Sink documentation. The canonical reference — including deployment patterns, performance tuning, and the full set of configuration options and limitations — is the upstream [`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink) in the `apache/iggy` repository. +</Callout> + +The HTTP Sink connector consumes messages from Iggy streams and delivers them to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS integrations. It is generic by design: you bring the URL, headers, and batching strategy, and the sink handles transport, retries, and metadata wrapping. + +## Configuration + +```toml +[[streams]] +stream = "events" +topics = ["notifications"] +schema = "json" +batch_length = 50 +poll_interval = "100ms" +consumer_group = "http_sink" + +[plugin_config] +url = "https://api.example.com/ingest" +batch_mode = "ndjson" +``` + +### Configuration Options + +| Option | Type | Default | Description | +| ------ | ---- | ------- | ----------- | +| `url` | string | **required** | Target URL for HTTP requests | +| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, `PATCH`, `DELETE` | +| `timeout` | string | `30s` | Request timeout (e.g. `10s`, `500ms`) | +| `max_payload_size_bytes` | u64 | `10485760` | Max body size in bytes (10MB). `0` to disable | +| `batch_mode` | string | `individual` | `individual`, `ndjson`, `json_array`, or `raw` | +| `include_metadata` | bool | `true` | Wrap payload in metadata envelope | +| `include_checksum` | bool | `false` | Add message checksum to metadata | +| `include_origin_timestamp` | bool | `false` | Add origin timestamp to metadata | +| `health_check_enabled` | bool | `false` | Send health check request in `open()` | +| `health_check_method` | string | `HEAD` | HTTP method for health check | +| `max_retries` | u32 | `3` | Retry attempts for transient errors | +| `retry_delay` | string | `1s` | Base delay between retries | +| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 1) | +| `max_retry_delay` | string | `30s` | Maximum retry delay cap | +| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes considered successful | +| `tls_danger_accept_invalid_certs` | bool | `false` | Skip TLS certificate validation | +| `max_connections` | usize | `10` | Max idle connections per host | +| `verbose_logging` | bool | `false` | Log request/response details at debug level | +| `headers` | table | `{}` | Custom HTTP headers (e.g. `Authorization`) | + +## Batch Modes + +The `batch_mode` option controls how messages from one poll cycle are delivered to the endpoint. + +- **`individual`** (default): one HTTP request per message. Best for webhooks and endpoints that accept single events. With `batch_length = 50`, this produces 50 sequential round trips per poll cycle. +- **`ndjson`**: all messages in one request, [newline-delimited JSON](https://github.com/ndjson/ndjson-spec). Best for bulk-ingestion endpoints. `Content-Type: application/x-ndjson`. +- **`json_array`**: all messages as a single JSON array. Best for APIs that expect array payloads. `Content-Type: application/json`. +- **`raw`**: raw bytes, one request per message. For non-JSON payloads (Protobuf, FlatBuffers, binary). The metadata envelope is not applied. `Content-Type: application/octet-stream`. + +For production throughput, prefer `ndjson` or `json_array` over `individual` — they collapse N round trips per poll cycle into one. + +## Metadata Envelope + +When `include_metadata = true` (default), the JSON-mode payload is wrapped: + +```json +{ + "metadata": { + "iggy_id": "0123456789abcdef0123456789abcdef", + "iggy_offset": 42, + "iggy_timestamp": 1710064800000000, + "iggy_stream": "my_stream", + "iggy_topic": "my_topic", + "iggy_partition_id": 0 + }, + "payload": { ... } +} +``` + +- `iggy_id` is a 32-character lowercase hex string (no dashes). +- For non-JSON payloads (`raw`, `flatbuffer`, `proto` schemas), the payload is base64-encoded and an `iggy_payload_encoding: "base64"` field is added. +- Set `include_metadata = false` to send the raw payload without wrapping (useful when the downstream service expects bare JSON, e.g. Slack webhooks). + +The connector does not require any particular message structure on input. The envelope is applied on the way out, not expected on the way in — your producers can publish whatever they like. + +## Authentication + +The HTTP sink supports authentication via custom headers under `[plugin_config.headers]`. All headers are sent with every request, including health checks. + +```toml +# Bearer token +[plugin_config.headers] +Authorization = "Bearer eyJhbGciOiJSUzI1NiIs..." + +# API key +[plugin_config.headers] +x-api-key = "my-secret-api-key" + +# Basic auth (base64 of "username:password") +[plugin_config.headers] +Authorization = "Basic dXNlcm5hbWU6cGFzc3dvcmQ=" +``` + +Multiple headers are supported and combined per request. For secrets, prefer environment variable overrides at the process level (see the upstream README) to keep tokens out of `config.toml`. + +## Retry & Delivery Semantics + +Failed requests use `reqwest-middleware` with exponential backoff: `retry_delay`, then `retry_delay * retry_backoff_multiplier`, capped at `max_retry_delay`. + +- **Transient errors** (retried): network errors, HTTP 429, 500, 502, 503, 504. +- **Non-transient errors** (fail immediately): HTTP 400, 401, 403, 404, 405, etc. +- **HTTP 429 `Retry-After`**: the header is logged but not honored; the middleware uses computed backoff. +- **Partial delivery** (`individual`/`raw` modes): after 3 consecutive HTTP failures, the remainder of the batch is aborted to avoid hammering a dead endpoint. + +The connector runtime commits consumer-group offsets *before* `consume()` is called and does not inspect its return value, so the effective delivery guarantee is **at-most-once** at the runtime level. The sink's internal retry loop provides best-effort delivery within each `consume()` call. + +## Example Configurations + +### Webhook (Slack) + +```toml +[plugin_config] +url = "https://hooks.slack.com/services/T00/B00/xxx" +batch_mode = "individual" +include_metadata = false # Slack expects bare JSON payload Review Comment: the comment "Slack expects bare JSON payload" is true but easy to misread. flipping `include_metadata = false` doesn't transform arbitrary payloads into Slack's `{"text": "..."}` shape - the sink does no payload transformation on outbound. add a one-liner: "your producer must publish Slack-compatible JSON; the sink does not transform payloads." ########## content/docs/connectors/sinks/http.mdx: ########## @@ -0,0 +1,179 @@ +--- +title: HTTP Sink +--- + +<Callout type="info"> + This page is a curated subset of the HTTP Sink documentation. The canonical reference — including deployment patterns, performance tuning, and the full set of configuration options and limitations — is the upstream [`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink) in the `apache/iggy` repository. +</Callout> + +The HTTP Sink connector consumes messages from Iggy streams and delivers them to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS integrations. It is generic by design: you bring the URL, headers, and batching strategy, and the sink handles transport, retries, and metadata wrapping. + +## Configuration + +```toml +[[streams]] +stream = "events" +topics = ["notifications"] +schema = "json" +batch_length = 50 +poll_interval = "100ms" +consumer_group = "http_sink" + +[plugin_config] +url = "https://api.example.com/ingest" +batch_mode = "ndjson" +``` + +### Configuration Options + +| Option | Type | Default | Description | +| ------ | ---- | ------- | ----------- | +| `url` | string | **required** | Target URL for HTTP requests | +| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, `PATCH`, `DELETE` | Review Comment: the method list includes `GET` and `HEAD` without flagging that those methods with non-`individual` batch modes produce a warning at runtime ("may be rejected by the server") since GET/HEAD don't conventionally carry bodies. either drop `GET`/`HEAD` from this list (rarely useful for a sink) or add a one-liner about the batch-mode interaction. ########## content/docs/connectors/sinks/http.mdx: ########## @@ -0,0 +1,179 @@ +--- +title: HTTP Sink +--- + +<Callout type="info"> + This page is a curated subset of the HTTP Sink documentation. The canonical reference — including deployment patterns, performance tuning, and the full set of configuration options and limitations — is the upstream [`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink) in the `apache/iggy` repository. +</Callout> + +The HTTP Sink connector consumes messages from Iggy streams and delivers them to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS integrations. It is generic by design: you bring the URL, headers, and batching strategy, and the sink handles transport, retries, and metadata wrapping. + +## Configuration + +```toml +[[streams]] +stream = "events" +topics = ["notifications"] +schema = "json" +batch_length = 50 +poll_interval = "100ms" +consumer_group = "http_sink" + +[plugin_config] +url = "https://api.example.com/ingest" +batch_mode = "ndjson" +``` + +### Configuration Options + +| Option | Type | Default | Description | +| ------ | ---- | ------- | ----------- | +| `url` | string | **required** | Target URL for HTTP requests | +| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, `PATCH`, `DELETE` | +| `timeout` | string | `30s` | Request timeout (e.g. `10s`, `500ms`) | +| `max_payload_size_bytes` | u64 | `10485760` | Max body size in bytes (10MB). `0` to disable | +| `batch_mode` | string | `individual` | `individual`, `ndjson`, `json_array`, or `raw` | +| `include_metadata` | bool | `true` | Wrap payload in metadata envelope | +| `include_checksum` | bool | `false` | Add message checksum to metadata | +| `include_origin_timestamp` | bool | `false` | Add origin timestamp to metadata | +| `health_check_enabled` | bool | `false` | Send health check request in `open()` | +| `health_check_method` | string | `HEAD` | HTTP method for health check | +| `max_retries` | u32 | `3` | Retry attempts for transient errors | +| `retry_delay` | string | `1s` | Base delay between retries | +| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 1) | +| `max_retry_delay` | string | `30s` | Maximum retry delay cap | +| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes considered successful | +| `tls_danger_accept_invalid_certs` | bool | `false` | Skip TLS certificate validation | +| `max_connections` | usize | `10` | Max idle connections per host | +| `verbose_logging` | bool | `false` | Log request/response details at debug level | +| `headers` | table | `{}` | Custom HTTP headers (e.g. `Authorization`) | + +## Batch Modes + +The `batch_mode` option controls how messages from one poll cycle are delivered to the endpoint. + +- **`individual`** (default): one HTTP request per message. Best for webhooks and endpoints that accept single events. With `batch_length = 50`, this produces 50 sequential round trips per poll cycle. +- **`ndjson`**: all messages in one request, [newline-delimited JSON](https://github.com/ndjson/ndjson-spec). Best for bulk-ingestion endpoints. `Content-Type: application/x-ndjson`. +- **`json_array`**: all messages as a single JSON array. Best for APIs that expect array payloads. `Content-Type: application/json`. +- **`raw`**: raw bytes, one request per message. For non-JSON payloads (Protobuf, FlatBuffers, binary). The metadata envelope is not applied. `Content-Type: application/octet-stream`. + +For production throughput, prefer `ndjson` or `json_array` over `individual` — they collapse N round trips per poll cycle into one. Review Comment: `raw` has the same N-round-trips problem as `individual` (one HTTP request per message - confirmed at `send_raw` in `lib.rs`) but isn't called out alongside in this throughput note. either include `raw` here or be explicit that `raw` shares the per-message cost. ########## content/docs/connectors/sinks/http.mdx: ########## @@ -0,0 +1,179 @@ +--- +title: HTTP Sink +--- + +<Callout type="info"> + This page is a curated subset of the HTTP Sink documentation. The canonical reference — including deployment patterns, performance tuning, and the full set of configuration options and limitations — is the upstream [`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink) in the `apache/iggy` repository. +</Callout> + +The HTTP Sink connector consumes messages from Iggy streams and delivers them to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS integrations. It is generic by design: you bring the URL, headers, and batching strategy, and the sink handles transport, retries, and metadata wrapping. + +## Configuration + +```toml +[[streams]] +stream = "events" +topics = ["notifications"] +schema = "json" +batch_length = 50 +poll_interval = "100ms" +consumer_group = "http_sink" + +[plugin_config] +url = "https://api.example.com/ingest" +batch_mode = "ndjson" +``` + +### Configuration Options + +| Option | Type | Default | Description | +| ------ | ---- | ------- | ----------- | +| `url` | string | **required** | Target URL for HTTP requests | +| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, `PATCH`, `DELETE` | +| `timeout` | string | `30s` | Request timeout (e.g. `10s`, `500ms`) | +| `max_payload_size_bytes` | u64 | `10485760` | Max body size in bytes (10MB). `0` to disable | +| `batch_mode` | string | `individual` | `individual`, `ndjson`, `json_array`, or `raw` | +| `include_metadata` | bool | `true` | Wrap payload in metadata envelope | +| `include_checksum` | bool | `false` | Add message checksum to metadata | +| `include_origin_timestamp` | bool | `false` | Add origin timestamp to metadata | +| `health_check_enabled` | bool | `false` | Send health check request in `open()` | +| `health_check_method` | string | `HEAD` | HTTP method for health check | +| `max_retries` | u32 | `3` | Retry attempts for transient errors | +| `retry_delay` | string | `1s` | Base delay between retries | +| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 1) | +| `max_retry_delay` | string | `30s` | Maximum retry delay cap | +| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes considered successful | +| `tls_danger_accept_invalid_certs` | bool | `false` | Skip TLS certificate validation | +| `max_connections` | usize | `10` | Max idle connections per host | +| `verbose_logging` | bool | `false` | Log request/response details at debug level | +| `headers` | table | `{}` | Custom HTTP headers (e.g. `Authorization`) | + +## Batch Modes + +The `batch_mode` option controls how messages from one poll cycle are delivered to the endpoint. + +- **`individual`** (default): one HTTP request per message. Best for webhooks and endpoints that accept single events. With `batch_length = 50`, this produces 50 sequential round trips per poll cycle. +- **`ndjson`**: all messages in one request, [newline-delimited JSON](https://github.com/ndjson/ndjson-spec). Best for bulk-ingestion endpoints. `Content-Type: application/x-ndjson`. +- **`json_array`**: all messages as a single JSON array. Best for APIs that expect array payloads. `Content-Type: application/json`. +- **`raw`**: raw bytes, one request per message. For non-JSON payloads (Protobuf, FlatBuffers, binary). The metadata envelope is not applied. `Content-Type: application/octet-stream`. + +For production throughput, prefer `ndjson` or `json_array` over `individual` — they collapse N round trips per poll cycle into one. + +## Metadata Envelope + +When `include_metadata = true` (default), the JSON-mode payload is wrapped: + +```json +{ + "metadata": { + "iggy_id": "0123456789abcdef0123456789abcdef", + "iggy_offset": 42, + "iggy_timestamp": 1710064800000000, + "iggy_stream": "my_stream", + "iggy_topic": "my_topic", + "iggy_partition_id": 0 + }, + "payload": { ... } +} +``` + +- `iggy_id` is a 32-character lowercase hex string (no dashes). +- For non-JSON payloads (`raw`, `flatbuffer`, `proto` schemas), the payload is base64-encoded and an `iggy_payload_encoding: "base64"` field is added. Review Comment: the prose says "an `iggy_payload_encoding: "base64"` field is added" but never shows the actual JSON shape. for `raw`/`flatbuffer`/`proto` schemas the sink emits the payload as `{"data": "<base64>", "iggy_payload_encoding": "base64"}` (see `EncodedPayload` struct). worth a 4-line example block - non-obvious that the bytes live in a `data` field. ########## content/docs/connectors/sinks/http.mdx: ########## @@ -0,0 +1,179 @@ +--- +title: HTTP Sink +--- + +<Callout type="info"> + This page is a curated subset of the HTTP Sink documentation. The canonical reference — including deployment patterns, performance tuning, and the full set of configuration options and limitations — is the upstream [`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink) in the `apache/iggy` repository. +</Callout> + +The HTTP Sink connector consumes messages from Iggy streams and delivers them to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS integrations. It is generic by design: you bring the URL, headers, and batching strategy, and the sink handles transport, retries, and metadata wrapping. + +## Configuration + +```toml +[[streams]] +stream = "events" +topics = ["notifications"] +schema = "json" +batch_length = 50 +poll_interval = "100ms" +consumer_group = "http_sink" + +[plugin_config] +url = "https://api.example.com/ingest" +batch_mode = "ndjson" +``` + +### Configuration Options + +| Option | Type | Default | Description | Review Comment: this whole config table is a verbatim copy of the upstream `core/connectors/sinks/http_sink/README.md`. defaults, types, and the transient retry codes (429/500/502/503/504 a few sections down) are all hardcoded constants in `lib.rs` and will silently drift the moment that file changes. options: trim this to the 5-6 most-used knobs plus a clear pointer to the upstream README for the full list, or add a CI step that diffs this table against the upstream README on each build. note `MAX_CONSECUTIVE_FAILURES = 3` is also a const but only surfaced in prose - if it changes upstream nothing flags it here. personally, i prefer linking upstream README. ########## content/docs/connectors/sinks/http.mdx: ########## @@ -0,0 +1,179 @@ +--- +title: HTTP Sink +--- + +<Callout type="info"> + This page is a curated subset of the HTTP Sink documentation. The canonical reference — including deployment patterns, performance tuning, and the full set of configuration options and limitations — is the upstream [`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink) in the `apache/iggy` repository. +</Callout> + +The HTTP Sink connector consumes messages from Iggy streams and delivers them to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS integrations. It is generic by design: you bring the URL, headers, and batching strategy, and the sink handles transport, retries, and metadata wrapping. + +## Configuration + +```toml +[[streams]] +stream = "events" +topics = ["notifications"] +schema = "json" +batch_length = 50 +poll_interval = "100ms" +consumer_group = "http_sink" + +[plugin_config] +url = "https://api.example.com/ingest" +batch_mode = "ndjson" +``` + +### Configuration Options + +| Option | Type | Default | Description | +| ------ | ---- | ------- | ----------- | +| `url` | string | **required** | Target URL for HTTP requests | +| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, `PATCH`, `DELETE` | +| `timeout` | string | `30s` | Request timeout (e.g. `10s`, `500ms`) | +| `max_payload_size_bytes` | u64 | `10485760` | Max body size in bytes (10MB). `0` to disable | +| `batch_mode` | string | `individual` | `individual`, `ndjson`, `json_array`, or `raw` | +| `include_metadata` | bool | `true` | Wrap payload in metadata envelope | +| `include_checksum` | bool | `false` | Add message checksum to metadata | +| `include_origin_timestamp` | bool | `false` | Add origin timestamp to metadata | +| `health_check_enabled` | bool | `false` | Send health check request in `open()` | +| `health_check_method` | string | `HEAD` | HTTP method for health check | +| `max_retries` | u32 | `3` | Retry attempts for transient errors | +| `retry_delay` | string | `1s` | Base delay between retries | +| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 1) | +| `max_retry_delay` | string | `30s` | Maximum retry delay cap | +| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes considered successful | Review Comment: the description is fine but misses a useful behavior: codes in this set are also never retried, even normally-transient ones like 429. so users who want to treat 429 as "queued/accepted" can put 429 here and it will short-circuit retries. worth a sentence on this - it's a non-obvious knob. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
