ektravel commented on code in PR #14329:
URL: https://github.com/apache/druid/pull/14329#discussion_r1260544200


##########
docs/ingestion/input-sources.md:
##########
@@ -794,6 +794,194 @@ The following is an example of a Combining input source 
spec:
 ...
 ```
 
+## Iceberg input source
+
+> You need to include the `druid-iceberg-extensions` as an extension to use 
the Iceberg input source.
+
+The Iceberg input source is used to read data stored in the Iceberg table 
format. For a given table, this input source scans up to the latest iceberg 
snapshot from the configured Hive catalog and the underlying live data files 
will be ingested using the existing input source formats available in Druid.
+
+The Iceberg input source cannot be independent as it relies on the existing 
input sources to perform the actual read from the Data files.
+For example, if the warehouse associated with an iceberg catalog is on `S3`, 
please ensure that the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension is also 
loaded.
+
+Sample specs:
+
+```json
+...
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "iceberg",
+        "tableName": "iceberg_table",
+        "namespace": "iceberg_namespace",
+        "icebergCatalog": {
+            "type": "hive",
+            "warehousePath": "hdfs://warehouse/path",
+            "catalogUri": "thrift://hive-metastore.x.com:8970",
+            "catalogProperties": {
+                "hive.metastore.connect.retries": "1",
+                "hive.metastore.execute.setugi": "false",
+                "hive.metastore.kerberos.principal": "KRB_PRINCIPAL",
+                "hive.metastore.sasl.enabled": "true",
+                "metastore.catalog.default": "catalog_test",
+                "hadoop.security.authentication": "kerberos",
+                "hadoop.security.authorization": "true"
+            }
+        },
+        "icebergFilter": {
+            "type": "interval",
+            "filterColumn": "event_time",
+            "intervals": [
+              "2023-05-10T19:00:00.000Z/2023-05-10T20:00:00.000Z"
+            ]
+        },
+        "warehouseSource": {
+            "type": "hdfs"
+        }
+      },
+      "inputFormat": {
+        "type": "parquet"
+      }
+  },
+      ...
+},
+...
+```
+
+```json
+...
+        "ioConfig": {
+          "type": "index_parallel",
+          "inputSource": {
+            "type": "iceberg",
+            "tableName": "iceberg_table",
+            "namespace": "iceberg_namespace",
+            "icebergCatalog": {
+              "type": "hive",
+              "warehousePath": "hdfs://warehouse/path",
+              "catalogUri": "thrift://hive-metastore.x.com:8970",
+              "catalogProperties": {
+                "hive.metastore.connect.retries": "1",
+                "hive.metastore.execute.setugi": "false",
+                "hive.metastore.kerberos.principal": "KRB_PRINCIPAL",
+                "hive.metastore.sasl.enabled": "true",
+                "metastore.catalog.default": "default_catalog",
+                "fs.s3a.access.key" : "S3_ACCESS_KEY",
+                "fs.s3a.secret.key" : "S3_SECRET_KEY",
+                "fs.s3a.endpoint" : "S3_API_ENDPOINT"
+              }
+            },
+            "icebergFilter": {
+              "type": "interval",
+              "filterColumn": "event_time",
+              "intervals": [
+                "2023-05-10T19:00:00.000Z/2023-05-10T20:00:00.000Z"
+              ]
+            },
+            "warehouseSource": {
+              "type": "s3",
+              "endpointConfig": {
+                "url": "teststore.aws.com",
+                "signingRegion": "us-west-2a"
+              },
+              "clientConfig": {
+                "protocol": "http",
+                "disableChunkedEncoding": true,
+                "enablePathStyleAccess": true,
+                "forceGlobalBucketAccessEnabled": false
+              },
+              "properties": {
+                "accessKeyId": {
+                  "type": "default",
+                  "password": "foo"
+                },
+                "secretAccessKey": {
+                  "type": "default",
+                  "password": "bar"
+                }
+              },
+            }
+          },
+          "inputFormat": {
+            "type": "parquet"
+          }
+        },
+...
+},
+```
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set the value to `iceberg`.|yes|
+|tableName|The iceberg table name configured in the catalog.|yes|
+|namespace|The iceberg namespace associated with the table|yes|
+|icebergFilter|JSON Object used to filter data files within a snapshot when 
reading|no|
+|icebergCatalog|JSON Object used to define the catalog that manages the 
configured iceberg table|yes|
+|warehouseSource|JSON Object used to indicate which native input source needs 
to be used to read the data files from the warehouse|yes|
+
+Catalog Object:
+
+There are two supported catalog types: `local` and `hive`
+
+Local catalog:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `local`.|yes|
+|warehousePath|The location of the warehouse associated with the catalog|yes|
+|catalogProperties|Map of any additional properties that needs to be attached 
to the catalog|no|
+
+Hive Catalog:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `hive`.|yes|
+|warehousePath|The location of the warehouse associated with the catalog|yes|
+|catalogUri|The URI associated with the hive catalog|yes|
+|catalogProperties|Map of any additional properties that needs to be attached 
to the catalog|no|
+
+IcebergFilter Object:
+
+This input source provides filters: `and` , `equals` , `interval` and `or`. 
These filters can be used to filter out data files from a snapshot, thereby 
reducing the number of files Druid has to ingest.
+
+`equals` Filter:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `equals`.|yes|

Review Comment:
   ```suggestion
   |`type`|Set this value to `equals`.|Yes|
   ```



##########
docs/ingestion/input-sources.md:
##########
@@ -794,6 +794,194 @@ The following is an example of a Combining input source 
spec:
 ...
 ```
 
+## Iceberg input source
+
+> You need to include the `druid-iceberg-extensions` as an extension to use 
the Iceberg input source.
+
+The Iceberg input source is used to read data stored in the Iceberg table 
format. For a given table, this input source scans up to the latest iceberg 
snapshot from the configured Hive catalog and the underlying live data files 
will be ingested using the existing input source formats available in Druid.
+
+The Iceberg input source cannot be independent as it relies on the existing 
input sources to perform the actual read from the Data files.
+For example, if the warehouse associated with an iceberg catalog is on `S3`, 
please ensure that the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension is also 
loaded.
+
+Sample specs:
+
+```json
+...
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "iceberg",
+        "tableName": "iceberg_table",
+        "namespace": "iceberg_namespace",
+        "icebergCatalog": {
+            "type": "hive",
+            "warehousePath": "hdfs://warehouse/path",
+            "catalogUri": "thrift://hive-metastore.x.com:8970",
+            "catalogProperties": {
+                "hive.metastore.connect.retries": "1",
+                "hive.metastore.execute.setugi": "false",
+                "hive.metastore.kerberos.principal": "KRB_PRINCIPAL",
+                "hive.metastore.sasl.enabled": "true",
+                "metastore.catalog.default": "catalog_test",
+                "hadoop.security.authentication": "kerberos",
+                "hadoop.security.authorization": "true"
+            }
+        },
+        "icebergFilter": {
+            "type": "interval",
+            "filterColumn": "event_time",
+            "intervals": [
+              "2023-05-10T19:00:00.000Z/2023-05-10T20:00:00.000Z"
+            ]
+        },
+        "warehouseSource": {
+            "type": "hdfs"
+        }
+      },
+      "inputFormat": {
+        "type": "parquet"
+      }
+  },
+      ...
+},
+...
+```
+
+```json
+...
+        "ioConfig": {
+          "type": "index_parallel",
+          "inputSource": {
+            "type": "iceberg",
+            "tableName": "iceberg_table",
+            "namespace": "iceberg_namespace",
+            "icebergCatalog": {
+              "type": "hive",
+              "warehousePath": "hdfs://warehouse/path",
+              "catalogUri": "thrift://hive-metastore.x.com:8970",
+              "catalogProperties": {
+                "hive.metastore.connect.retries": "1",
+                "hive.metastore.execute.setugi": "false",
+                "hive.metastore.kerberos.principal": "KRB_PRINCIPAL",
+                "hive.metastore.sasl.enabled": "true",
+                "metastore.catalog.default": "default_catalog",
+                "fs.s3a.access.key" : "S3_ACCESS_KEY",
+                "fs.s3a.secret.key" : "S3_SECRET_KEY",
+                "fs.s3a.endpoint" : "S3_API_ENDPOINT"
+              }
+            },
+            "icebergFilter": {
+              "type": "interval",
+              "filterColumn": "event_time",
+              "intervals": [
+                "2023-05-10T19:00:00.000Z/2023-05-10T20:00:00.000Z"
+              ]
+            },
+            "warehouseSource": {
+              "type": "s3",
+              "endpointConfig": {
+                "url": "teststore.aws.com",
+                "signingRegion": "us-west-2a"
+              },
+              "clientConfig": {
+                "protocol": "http",
+                "disableChunkedEncoding": true,
+                "enablePathStyleAccess": true,
+                "forceGlobalBucketAccessEnabled": false
+              },
+              "properties": {
+                "accessKeyId": {
+                  "type": "default",
+                  "password": "foo"
+                },
+                "secretAccessKey": {
+                  "type": "default",
+                  "password": "bar"
+                }
+              },
+            }
+          },
+          "inputFormat": {
+            "type": "parquet"
+          }
+        },
+...
+},
+```
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set the value to `iceberg`.|yes|
+|tableName|The iceberg table name configured in the catalog.|yes|
+|namespace|The iceberg namespace associated with the table|yes|
+|icebergFilter|JSON Object used to filter data files within a snapshot when 
reading|no|
+|icebergCatalog|JSON Object used to define the catalog that manages the 
configured iceberg table|yes|
+|warehouseSource|JSON Object used to indicate which native input source needs 
to be used to read the data files from the warehouse|yes|
+
+Catalog Object:
+
+There are two supported catalog types: `local` and `hive`
+
+Local catalog:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `local`.|yes|
+|warehousePath|The location of the warehouse associated with the catalog|yes|
+|catalogProperties|Map of any additional properties that needs to be attached 
to the catalog|no|
+
+Hive Catalog:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `hive`.|yes|
+|warehousePath|The location of the warehouse associated with the catalog|yes|
+|catalogUri|The URI associated with the hive catalog|yes|
+|catalogProperties|Map of any additional properties that needs to be attached 
to the catalog|no|
+
+IcebergFilter Object:
+
+This input source provides filters: `and` , `equals` , `interval` and `or`. 
These filters can be used to filter out data files from a snapshot, thereby 
reducing the number of files Druid has to ingest.
+
+`equals` Filter:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `equals`.|yes|
+|filterColumn|The column name from the iceberg table schema based on which 
filtering needs to happen|yes|
+|filterValue|The value to filter on|yes|
+
+`interval` Filter:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `interval`.|yes|

Review Comment:
   ```suggestion
   |`type`|Set this value to `interval`.|Yes|
   ```



##########
docs/ingestion/input-sources.md:
##########
@@ -794,6 +794,194 @@ The following is an example of a Combining input source 
spec:
 ...
 ```
 
+## Iceberg input source
+
+> You need to include the `druid-iceberg-extensions` as an extension to use 
the Iceberg input source.
+
+The Iceberg input source is used to read data stored in the Iceberg table 
format. For a given table, this input source scans up to the latest iceberg 
snapshot from the configured Hive catalog and the underlying live data files 
will be ingested using the existing input source formats available in Druid.
+
+The Iceberg input source cannot be independent as it relies on the existing 
input sources to perform the actual read from the Data files.
+For example, if the warehouse associated with an iceberg catalog is on `S3`, 
please ensure that the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension is also 
loaded.
+
+Sample specs:
+
+```json
+...
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "iceberg",
+        "tableName": "iceberg_table",
+        "namespace": "iceberg_namespace",
+        "icebergCatalog": {
+            "type": "hive",
+            "warehousePath": "hdfs://warehouse/path",
+            "catalogUri": "thrift://hive-metastore.x.com:8970",
+            "catalogProperties": {
+                "hive.metastore.connect.retries": "1",
+                "hive.metastore.execute.setugi": "false",
+                "hive.metastore.kerberos.principal": "KRB_PRINCIPAL",
+                "hive.metastore.sasl.enabled": "true",
+                "metastore.catalog.default": "catalog_test",
+                "hadoop.security.authentication": "kerberos",
+                "hadoop.security.authorization": "true"
+            }
+        },
+        "icebergFilter": {
+            "type": "interval",
+            "filterColumn": "event_time",
+            "intervals": [
+              "2023-05-10T19:00:00.000Z/2023-05-10T20:00:00.000Z"
+            ]
+        },
+        "warehouseSource": {
+            "type": "hdfs"
+        }
+      },
+      "inputFormat": {
+        "type": "parquet"
+      }
+  },
+      ...
+},
+...
+```
+
+```json
+...
+        "ioConfig": {
+          "type": "index_parallel",
+          "inputSource": {
+            "type": "iceberg",
+            "tableName": "iceberg_table",
+            "namespace": "iceberg_namespace",
+            "icebergCatalog": {
+              "type": "hive",
+              "warehousePath": "hdfs://warehouse/path",
+              "catalogUri": "thrift://hive-metastore.x.com:8970",
+              "catalogProperties": {
+                "hive.metastore.connect.retries": "1",
+                "hive.metastore.execute.setugi": "false",
+                "hive.metastore.kerberos.principal": "KRB_PRINCIPAL",
+                "hive.metastore.sasl.enabled": "true",
+                "metastore.catalog.default": "default_catalog",
+                "fs.s3a.access.key" : "S3_ACCESS_KEY",
+                "fs.s3a.secret.key" : "S3_SECRET_KEY",
+                "fs.s3a.endpoint" : "S3_API_ENDPOINT"
+              }
+            },
+            "icebergFilter": {
+              "type": "interval",
+              "filterColumn": "event_time",
+              "intervals": [
+                "2023-05-10T19:00:00.000Z/2023-05-10T20:00:00.000Z"
+              ]
+            },
+            "warehouseSource": {
+              "type": "s3",
+              "endpointConfig": {
+                "url": "teststore.aws.com",
+                "signingRegion": "us-west-2a"
+              },
+              "clientConfig": {
+                "protocol": "http",
+                "disableChunkedEncoding": true,
+                "enablePathStyleAccess": true,
+                "forceGlobalBucketAccessEnabled": false
+              },
+              "properties": {
+                "accessKeyId": {
+                  "type": "default",
+                  "password": "foo"
+                },
+                "secretAccessKey": {
+                  "type": "default",
+                  "password": "bar"
+                }
+              },
+            }
+          },
+          "inputFormat": {
+            "type": "parquet"
+          }
+        },
+...
+},
+```
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set the value to `iceberg`.|yes|
+|tableName|The iceberg table name configured in the catalog.|yes|
+|namespace|The iceberg namespace associated with the table|yes|
+|icebergFilter|JSON Object used to filter data files within a snapshot when 
reading|no|
+|icebergCatalog|JSON Object used to define the catalog that manages the 
configured iceberg table|yes|
+|warehouseSource|JSON Object used to indicate which native input source needs 
to be used to read the data files from the warehouse|yes|
+
+Catalog Object:
+
+There are two supported catalog types: `local` and `hive`
+
+Local catalog:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `local`.|yes|
+|warehousePath|The location of the warehouse associated with the catalog|yes|
+|catalogProperties|Map of any additional properties that needs to be attached 
to the catalog|no|
+
+Hive Catalog:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `hive`.|yes|
+|warehousePath|The location of the warehouse associated with the catalog|yes|
+|catalogUri|The URI associated with the hive catalog|yes|
+|catalogProperties|Map of any additional properties that needs to be attached 
to the catalog|no|
+
+IcebergFilter Object:
+
+This input source provides filters: `and` , `equals` , `interval` and `or`. 
These filters can be used to filter out data files from a snapshot, thereby 
reducing the number of files Druid has to ingest.
+
+`equals` Filter:
+
+|Property|Description|Required|
+|--------|-----------|---------|
+|type|Set this value to `equals`.|yes|
+|filterColumn|The column name from the iceberg table schema based on which 
filtering needs to happen|yes|
+|filterValue|The value to filter on|yes|

Review Comment:
   ```suggestion
   |`filterValue`|The value to filter on.|Yes|
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to