This is an automated email from the ASF dual-hosted git repository.
cgivre pushed a commit to branch add_iceberg
in repository https://gitbox.apache.org/repos/asf/drill-site.git
The following commit(s) were added to refs/heads/add_iceberg by this push:
new 12f3181 Add Iceberg
12f3181 is described below
commit 12f318125ad9f3a48f4b61ee59b24164d6b0aa2a
Author: Charles Givre <[email protected]>
AuthorDate: Tue Mar 8 10:43:54 2022 -0500
Add Iceberg
---
.../125-iceburg-format-plugin.md | 123 +++++++++++++++++++++
1 file changed, 123 insertions(+)
diff --git
a/_docs/en/data-sources-and-file-formats/125-iceburg-format-plugin.md
b/_docs/en/data-sources-and-file-formats/125-iceburg-format-plugin.md
new file mode 100644
index 0000000..716ed5a
--- /dev/null
+++ b/_docs/en/data-sources-and-file-formats/125-iceburg-format-plugin.md
@@ -0,0 +1,123 @@
+---
+title: "Iceberg Format Plugin"
+slug: "Iceberg Format Plugin"
+parent: "Data Sources and File Formats"
+---
+
+**Introduced in release:** 1.20
+
+This format plugin enabled Drill to query Apache Iceberg tables.
+
+Unlike regular format plugins, the Iceberg table is a folder with data and
metadata files, but Drill checks the presence
+of the `metadata` folder to ensure that the table is Iceberg one.
+
+Drill supports reading all formats of Iceberg tables available at this moment:
Parquet, Avro, and ORC.
+No need to provide actual table format, it will be discovered automatically.
+
+For details related to Apache Iceberg table format, please refer to [official
docs](https://iceberg.apache.org/#).
+
+## Supported optimizations and features
+
+### Project pushdown
+
+This format plugin supports project and filter pushdown optimizations.
+
+For the case of project pushdown, only columns specified in the query will be
read, even they are nested columns. In
+conjunction with column-oriented formats like Parquet or ORC, it allows
improving reading performance significantly.
+
+### Filter pushdown
+
+For the case of filter pushdown, all expressions supported by Iceberg API will
be pushed down, so only data that matches
+the filter expression will be read.
+
+### Schema provisioning
+
+This format plugin supports the schema provisioning feature. Though Iceberg
provides table schema, in some cases, it
+might be useful to select data with customized schema, so it can be done using
the table function:
+
+```sql
+SELECT int_field,
+ string_field
+FROM table(dfs.tmp.testAllTypes(schema => 'inline=(int_field varchar not null
default `error`)'))
+```
+
+In this example, we convert int field to string and return `'error'` literals
for null values.
+
+### Querying table metadata
+
+Apache Drill provides the ability to query any kind of table metadata Iceberg
can return.
+
+At this point, Apache Iceberg has the following metadata kinds:
+
+* ENTRIES
+* FILES
+* HISTORY
+* SNAPSHOTS
+* MANIFESTS
+* PARTITIONS
+* ALL_DATA_FILES
+* ALL_MANIFESTS
+* ALL_ENTRIES
+
+To query specific metadata, just add the `#metadata_name` suffix to the table
location, like in the following example:
+
+```sql
+SELECT *
+FROM dfs.tmp.`testAllTypes#snapshots`
+```
+
+### Querying specific table versions (snapshots)
+
+Apache Iceberg has the ability to track the table modifications and read
specific version before or after modifications
+or modifications itself.
+
+This format plugin embraces this ability and provides an easy-to-use way of
triggering it.
+
+The following ways of specifying table version are supported:
+
+- `snapshotId` - id of the specific snapshot
+- `snapshotAsOfTime` - the most recent snapshot as of the given time in
milliseconds
+- `fromSnapshotId` - read appended data from `fromSnapshotId` exclusive to the
current snapshot inclusive
+- \[`fromSnapshotId` : `toSnapshotId`\] - read appended data from
`fromSnapshotId` exclusive to `toSnapshotId` inclusive
+
+Table function can be used to specify one of the above configs in the
following way:
+
+```sql
+SELECT *
+FROM table(dfs.tmp.testAllTypes(type => 'iceberg', snapshotId => 123456789));
+
+SELECT *
+FROM table(dfs.tmp.testAllTypes(type => 'iceberg', snapshotAsOfTime =>
1636231332000));
+
+SELECT *
+FROM table(dfs.tmp.testAllTypes(type => 'iceberg', fromSnapshotId =>
123456789));
+
+SELECT *
+FROM table(dfs.tmp.testAllTypes(type => 'iceberg', fromSnapshotId =>
123456789, toSnapshotId => 987654321));
+```
+
+## Configuration
+
+Format plugin has the following configuration options:
+
+- `type` - format plugin type, should be `'iceberg'`
+- `properties` - Iceberg-specific table properties. Please refer to
[Configuration](https://iceberg.apache.org/#configuration/) page for more
details
+- `caseSensitive` - whether table columns are case-sensitive
+
+### Format config example:
+
+```json
+{
+ "type": "file",
+ "formats": {
+ "iceberg": {
+ "type": "iceberg",
+ "properties": {
+ "read.split.target-size": "134217728",
+ "read.split.metadata-target-size": "33554432"
+ },
+ "caseSensitive": true
+ }
+ }
+}
+```
\ No newline at end of file