samredai commented on code in PR #4801: URL: https://github.com/apache/iceberg/pull/4801#discussion_r924006469
########## docs/configurations.md: ########## @@ -0,0 +1,229 @@ +--- +title: "Configurations" +url: configurations +aliases: + - config +menu: main +weight: 1201 +--- +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Configurations + +### Table Read Properties + +| Property | Default | Description | +| --------------------------------- | ------------------ | ------------------------------------------------------ | +| read.split.target-size | 134217728 (128 MB) | Target size when combining data input splits | +| read.split.metadata-target-size | 33554432 (32 MB) | Target size when combining metadata input splits | +| read.split.planning-lookback | 10 | Number of bins to consider when combining input splits | +| read.split.open-file-cost | 4194304 (4 MB) | The estimated cost to open a file, used as a minimum weight when combining splits. | +| read.parquet.vectorization.enabled| false | Enables parquet vectorized reads | +| read.parquet.vectorization.batch-size| 5000 | The batch size for parquet vectorized reads | +| read.orc.vectorization.enabled | false | Enables orc vectorized reads | +| read.orc.vectorization.batch-size | 5000 | The batch size for orc vectorized reads | + +### Table Write Properties + +| Property | Default | Description | +| ---------------------------------- | ------------------ | -------------------------------------------------- | +| write.format.default | parquet | Default file format for the table; parquet, avro, or orc | +| write.delete.format.default | data file format | Default delete file format for the table; parquet, avro, or orc | +| write.parquet.row-group-size-bytes | 134217728 (128 MB) | Parquet row group size | +| write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size | +| write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary page size | +| write.parquet.compression-codec | gzip | Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed | +| write.parquet.compression-level | null | Parquet compression level | +| write.avro.compression-codec | gzip | Avro compression codec: gzip(deflate with 9 level), zstd, snappy, uncompressed | +| write.avro.compression-level | null | Avro compression level | +| write.orc.stripe-size-bytes | 67108864 (64 MB) | Define the default ORC stripe size, in bytes | +| write.orc.block-size-bytes | 268435456 (256 MB) | Define the default file system block size for ORC files | +| write.orc.compression-codec | zlib | ORC compression codec: zstd, lz4, lzo, zlib, snappy, none | +| write.orc.compression-strategy | speed | ORC compression strategy: speed, compression | +| write.location-provider.impl | null | Optional custom implementation for LocationProvider | +| write.metadata.compression-codec | none | Metadata compression codec; none or gzip | +| write.metadata.metrics.default | truncate(16) | Default metrics mode for all columns in the table; none, counts, truncate(length), or full | +| write.metadata.metrics.column.col1 | (not set) | Metrics mode for column 'col1' to allow per-column tuning; none, counts, truncate(length), or full | +| write.target-file-size-bytes | 536870912 (512 MB) | Controls the size of files generated to target about this many bytes | +| write.delete.target-file-size-bytes| 67108864 (64 MB) | Controls the size of delete files generated to target about this many bytes | +| write.distribution-mode | none | Defines distribution of write data: __none__: don't shuffle rows; __hash__: hash distribute by partition key ; __range__: range distribute by partition key or sort key if table has an SortOrder | +| write.delete.distribution-mode | hash | Defines distribution of write delete data | +| write.wap.enabled | false | Enables write-audit-publish writes | +| write.summary.partition-limit | 0 | Includes partition-level summary stats in snapshot summaries if the changed partition count is less than this limit | +| write.metadata.delete-after-commit.enabled | false | Controls whether to delete the oldest version metadata files after commit | +| write.metadata.previous-versions-max | 100 | The max number of previous version metadata files to keep before deleting after commit | +| write.spark.fanout.enabled | false | Enables the fanout writer in Spark that does not require data to be clustered; uses more memory | +| write.object-storage.enabled | false | Enables the object storage location provider that adds a hash component to file paths | +| write.data.path | table location + /data | Base location for data files | +| write.metadata.path | table location + /metadata | Base location for metadata files | +| write.delete.mode | copy-on-write | Mode used for delete commands: copy-on-write or merge-on-read (v2 only) | +| write.delete.isolation-level | serializable | Isolation level for delete commands: serializable or snapshot | +| write.update.mode | copy-on-write | Mode used for update commands: copy-on-write or merge-on-read (v2 only) | +| write.update.isolation-level | serializable | Isolation level for update commands: serializable or snapshot | +| write.merge.mode | copy-on-write | Mode used for merge commands: copy-on-write or merge-on-read (v2 only) | +| write.merge.isolation-level | serializable | Isolation level for merge commands: serializable or snapshot | + +### Table Behavior Properties + +| Property | Default | Description | +| ---------------------------------- | ---------------- | ------------------------------------------------------------- | +| commit.retry.num-retries | 4 | Number of times to retry a commit before failing | +| commit.retry.min-wait-ms | 100 | Minimum time in milliseconds to wait before retrying a commit | +| commit.retry.max-wait-ms | 60000 (1 min) | Maximum time in milliseconds to wait before retrying a commit | +| commit.retry.total-timeout-ms | 1800000 (30 min) | Total retry timeout period in milliseconds for a commit | +| commit.status-check.num-retries | 3 | Number of times to check whether a commit succeeded after a connection is lost before failing due to an unknown commit state | +| commit.status-check.min-wait-ms | 1000 (1s) | Minimum time in milliseconds to wait before retrying a status-check | +| commit.status-check.max-wait-ms | 60000 (1 min) | Maximum time in milliseconds to wait before retrying a status-check | +| commit.status-check.total-timeout-ms| 1800000 (30 min) | Total timeout period in which the commit status-check must succeed, in milliseconds | +| commit.manifest.target-size-bytes | 8388608 (8 MB) | Target size when merging manifest files | +| commit.manifest.min-count-to-merge | 100 | Minimum number of manifests to accumulate before merging | +| commit.manifest-merge.enabled | true | Controls whether to automatically merge manifests on writes | +| history.expire.max-snapshot-age-ms | 432000000 (5 days) | Default max age of snapshots to keep while expiring snapshots | +| history.expire.min-snapshots-to-keep | 1 | Default min number of snapshots to keep while expiring snapshots | +| history.expire.max-ref-age-ms | `Long.MAX_VALUE` (forever) | For snapshot references except the `main` branch, default max age of snapshot references to keep while expiring snapshots. The `main` branch never expires. | + +### Reserved Table Properties + +| Property | Default | Description | +| -------------- | -------- | ------------------------------------------------------------- | +| format-version | 1 | Table's format version (can be 1 or 2) as defined in the [Spec](../../../spec/#format-versioning). | + +{{< hint info >}} +Reserved table properties are only used to control behaviors when creating or updating a table. +The value of these properties are not persisted as a part of the table metadata. +{{< /hint >}} + +### Compatibility Flags + +| Property | Default | Description | +| --------------------------------------------- | -------- | ------------------------------------------------------------- | +| compatibility.snapshot-id-inheritance.enabled | false | Enables committing snapshots without explicit snapshot IDs | Review Comment: Added `; Not required for v2 tables` to the description. 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
