ektravel commented on code in PR #14329:
URL: https://github.com/apache/druid/pull/14329#discussion_r1260526833
##########
docs/ingestion/input-sources.md:
##########
@@ -794,6 +794,194 @@ The following is an example of a Combining input source
spec:
...
```
+## Iceberg input source
+
+> You need to include the `druid-iceberg-extensions` as an extension to use
the Iceberg input source.
+
+The Iceberg input source is used to read data stored in the Iceberg table
format. For a given table, this input source scans up to the latest iceberg
snapshot from the configured Hive catalog and the underlying live data files
will be ingested using the existing input source formats available in Druid.
+
+The Iceberg input source cannot be independent as it relies on the existing
input sources to perform the actual read from the Data files.
+For example, if the warehouse associated with an iceberg catalog is on `S3`,
please ensure that the
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension is also
loaded.
+
+Sample specs:
+
+```json
+...
+ "ioConfig": {
+ "type": "index_parallel",
+ "inputSource": {
+ "type": "iceberg",
+ "tableName": "iceberg_table",
+ "namespace": "iceberg_namespace",
+ "icebergCatalog": {
+ "type": "hive",
+ "warehousePath": "hdfs://warehouse/path",
+ "catalogUri": "thrift://hive-metastore.x.com:8970",
+ "catalogProperties": {
+ "hive.metastore.connect.retries": "1",
+ "hive.metastore.execute.setugi": "false",
+ "hive.metastore.kerberos.principal": "KRB_PRINCIPAL",
+ "hive.metastore.sasl.enabled": "true",
+ "metastore.catalog.default": "catalog_test",
+ "hadoop.security.authentication": "kerberos",
+ "hadoop.security.authorization": "true"
+ }
+ },
+ "icebergFilter": {
+ "type": "interval",
+ "filterColumn": "event_time",
+ "intervals": [
+ "2023-05-10T19:00:00.000Z/2023-05-10T20:00:00.000Z"
+ ]
+ },
+ "warehouseSource": {
+ "type": "hdfs"
+ }
+ },
+ "inputFormat": {
+ "type": "parquet"
+ }
+ },
+ ...
+},
+...
+```
+
+```json
+...
+ "ioConfig": {
+ "type": "index_parallel",
+ "inputSource": {
+ "type": "iceberg",
+ "tableName": "iceberg_table",
+ "namespace": "iceberg_namespace",
+ "icebergCatalog": {
+ "type": "hive",
+ "warehousePath": "hdfs://warehouse/path",
+ "catalogUri": "thrift://hive-metastore.x.com:8970",
+ "catalogProperties": {
+ "hive.metastore.connect.retries": "1",
+ "hive.metastore.execute.setugi": "false",
+ "hive.metastore.kerberos.principal": "KRB_PRINCIPAL",
+ "hive.metastore.sasl.enabled": "true",
+ "metastore.catalog.default": "default_catalog",
+ "fs.s3a.access.key" : "S3_ACCESS_KEY",
+ "fs.s3a.secret.key" : "S3_SECRET_KEY",
+ "fs.s3a.endpoint" : "S3_API_ENDPOINT"
+ }
+ },
+ "icebergFilter": {
+ "type": "interval",
+ "filterColumn": "event_time",
+ "intervals": [
+ "2023-05-10T19:00:00.000Z/2023-05-10T20:00:00.000Z"
+ ]
+ },
+ "warehouseSource": {
+ "type": "s3",
+ "endpointConfig": {
+ "url": "teststore.aws.com",
+ "signingRegion": "us-west-2a"
+ },
+ "clientConfig": {
+ "protocol": "http",
+ "disableChunkedEncoding": true,
+ "enablePathStyleAccess": true,
+ "forceGlobalBucketAccessEnabled": false
+ },
+ "properties": {
+ "accessKeyId": {
+ "type": "default",
+ "password": "foo"
+ },
+ "secretAccessKey": {
+ "type": "default",
+ "password": "bar"
+ }
+ },
+ }
+ },
+ "inputFormat": {
+ "type": "parquet"
+ }
+ },
+...
+},
+```
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set the value to `iceberg`.|yes|
+|tableName|The iceberg table name configured in the catalog.|yes|
+|namespace|The iceberg namespace associated with the table|yes|
+|icebergFilter|JSON Object used to filter data files within a snapshot when
reading|no|
+|icebergCatalog|JSON Object used to define the catalog that manages the
configured iceberg table|yes|
+|warehouseSource|JSON Object used to indicate which native input source needs
to be used to read the data files from the warehouse|yes|
+
+Catalog Object:
+
+There are two supported catalog types: `local` and `hive`
+
+Local catalog:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `local`.|yes|
+|warehousePath|The location of the warehouse associated with the catalog|yes|
+|catalogProperties|Map of any additional properties that needs to be attached
to the catalog|no|
+
+Hive Catalog:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `hive`.|yes|
+|warehousePath|The location of the warehouse associated with the catalog|yes|
Review Comment:
```suggestion
|`warehousePath`|The location of the warehouse associated with the
catalog.|Yes|
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]