bryanck commented on code in PR #5345:
URL: https://github.com/apache/iceberg/pull/5345#discussion_r930214751
##########
docs/configuration.md:
##########
@@ -46,45 +46,46 @@ Iceberg tables support table properties to configure table
behavior, like the de
### Write properties
-| Property | Default | Description
|
-| ---------------------------------- | ------------------ |
-------------------------------------------------- |
-| write.format.default | parquet | Default file
format for the table; parquet, avro, or orc |
-| write.delete.format.default | data file format | Default delete
file format for the table; parquet, avro, or orc |
-| write.parquet.row-group-size-bytes | 134217728 (128 MB) | Parquet row group
size |
-| write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size
|
-| write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary
page size |
-| write.parquet.compression-codec | gzip | Parquet
compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed |
-| write.parquet.compression-level | null | Parquet
compression level |
-| write.parquet.bloom-filter-enabled.column.col1 | (not set) |
Enables writing a bloom filter for the column: col1|
-| write.parquet.bloom-filter-max-bytes | 1048576 (1 MB) | The maximum number
of bytes for a bloom filter bitset |
-| write.avro.compression-codec | gzip | Avro compression
codec: gzip(deflate with 9 level), zstd, snappy, uncompressed |
-| write.avro.compression-level | null | Avro compression
level |
-| write.orc.stripe-size-bytes | 67108864 (64 MB) | Define the default
ORC stripe size, in bytes |
-| write.orc.block-size-bytes | 268435456 (256 MB) | Define the default
file system block size for ORC files |
-| write.orc.compression-codec | zlib | ORC compression
codec: zstd, lz4, lzo, zlib, snappy, none |
-| write.orc.compression-strategy | speed | ORC compression
strategy: speed, compression |
-| write.location-provider.impl | null | Optional custom
implementation for LocationProvider |
-| write.metadata.compression-codec | none | Metadata
compression codec; none or gzip |
-| write.metadata.metrics.default | truncate(16) | Default metrics
mode for all columns in the table; none, counts, truncate(length), or full |
-| write.metadata.metrics.column.col1 | (not set) | Metrics mode for
column 'col1' to allow per-column tuning; none, counts, truncate(length), or
full |
-| write.target-file-size-bytes | 536870912 (512 MB) | Controls the size
of files generated to target about this many bytes |
-| write.delete.target-file-size-bytes| 67108864 (64 MB) | Controls the size
of delete files generated to target about this many bytes |
-| write.distribution-mode | none | Defines
distribution of write data: __none__: don't shuffle rows; __hash__: hash
distribute by partition key ; __range__: range distribute by partition key or
sort key if table has an SortOrder |
-| write.delete.distribution-mode | hash | Defines
distribution of write delete data |
-| write.wap.enabled | false | Enables
write-audit-publish writes |
-| write.summary.partition-limit | 0 | Includes
partition-level summary stats in snapshot summaries if the changed partition
count is less than this limit |
-| write.metadata.delete-after-commit.enabled | false | Controls whether
to delete the oldest version metadata files after commit |
-| write.metadata.previous-versions-max | 100 | The max number of
previous version metadata files to keep before deleting after commit |
-| write.spark.fanout.enabled | false | Enables the fanout
writer in Spark that does not require data to be clustered; uses more memory |
-| write.object-storage.enabled | false | Enables the object
storage location provider that adds a hash component to file paths |
-| write.data.path | table location + /data | Base location
for data files |
-| write.metadata.path | table location + /metadata | Base
location for metadata files |
-| write.delete.mode | copy-on-write | Mode used for
delete commands: copy-on-write or merge-on-read (v2 only) |
-| write.delete.isolation-level | serializable | Isolation level
for delete commands: serializable or snapshot |
-| write.update.mode | copy-on-write | Mode used for
update commands: copy-on-write or merge-on-read (v2 only) |
-| write.update.isolation-level | serializable | Isolation level
for update commands: serializable or snapshot |
-| write.merge.mode | copy-on-write | Mode used for
merge commands: copy-on-write or merge-on-read (v2 only) |
-| write.merge.isolation-level | serializable | Isolation level
for merge commands: serializable or snapshot |
+| Property | Default
| Description
|
+|------------------------------------------------|----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| write.format.default | parquet
| Default file format for the table; parquet, avro, or orc
|
+| write.delete.format.default | data file format
| Default delete file format for the table; parquet, avro, or orc
|
+| write.parquet.row-group-size-bytes | 134217728 (128 MB)
| Parquet row group size
|
+| write.parquet.page-size-bytes | 1048576 (1 MB)
| Parquet page size
|
+| write.parquet.page-row-count-limit | 20000
| Parquet page row count limit
|
+| write.parquet.dict-size-bytes | 2097152 (2 MB)
| Parquet dictionary page size
|
+| write.parquet.compression-codec | gzip
| Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed
|
+| write.parquet.compression-level | null
| Parquet compression level
|
+| write.parquet.bloom-filter-enabled.column.col1 | (not set)
| Enables writing a bloom filter for the column: col1
|
+| write.parquet.bloom-filter-max-bytes | 1048576 (1 MB)
| The maximum number of bytes for a bloom filter bitset
|
+| write.avro.compression-codec | gzip
| Avro compression codec: gzip(deflate with 9 level), zstd, snappy,
uncompressed
|
+| write.avro.compression-level | null
| Avro compression level
|
+| write.orc.stripe-size-bytes | 67108864 (64 MB)
| Define the default ORC stripe size, in bytes
|
+| write.orc.block-size-bytes | 268435456 (256 MB)
| Define the default file system block size for ORC files
|
+| write.orc.compression-codec | zlib
| ORC compression codec: zstd, lz4, lzo, zlib, snappy, none
|
+| write.orc.compression-strategy | speed
| ORC compression strategy: speed, compression
|
+| write.location-provider.impl | null
| Optional custom implementation for LocationProvider
|
+| write.metadata.compression-codec | none
| Metadata compression codec; none or gzip
|
+| write.metadata.metrics.default | truncate(16)
| Default metrics mode for all columns in the table; none, counts,
truncate(length), or full
|
+| write.metadata.metrics.column.col1 | (not set)
| Metrics mode for column 'col1' to allow per-column tuning; none, counts,
truncate(length), or full
|
+| write.target-file-size-bytes | 536870912 (512 MB)
| Controls the size of files generated to target about this many bytes
|
+| write.delete.target-file-size-bytes | 67108864 (64 MB)
| Controls the size of delete files generated to target about this many bytes
|
+| write.distribution-mode | none
| Defines distribution of write data: __none__: don't shuffle rows; __hash__:
hash distribute by partition key ; __range__: range distribute by partition key
or sort key if table has an SortOrder |
+| write.delete.distribution-mode | hash
| Defines distribution of write delete data
|
+| write.wap.enabled | false
| Enables write-audit-publish writes
|
+| write.summary.partition-limit | 0
| Includes partition-level summary stats in snapshot summaries if the changed
partition count is less than this limit
|
+| write.metadata.delete-after-commit.enabled | false
| Controls whether to delete the oldest version metadata files after commit
|
+| write.metadata.previous-versions-max | 100
| The max number of previous version metadata files to keep before deleting
after commit
|
+| write.spark.fanout.enabled | false
| Enables the fanout writer in Spark that does not require data to be
clustered; uses more memory
|
+| write.object-storage.enabled | false
| Enables the object storage location provider that adds a hash component to
file paths
|
+| write.data.path | table location + /data
| Base location for data files
|
+| write.metadata.path | table location + /metadata
| Base location for metadata files
|
+| write.delete.mode | copy-on-write
| Mode used for delete commands: copy-on-write or merge-on-read (v2 only)
|
+| write.delete.isolation-level | serializable
| Isolation level for delete commands: serializable or snapshot
|
+| write.update.mode | copy-on-write
| Mode used for update commands: copy-on-write or merge-on-read (v2 only)
|
+| write.update.isolation-level | serializable
| Isolation level for update commands: serializable or snapshot
|
+| write.merge.mode | copy-on-write
| Mode used for merge commands: copy-on-write or merge-on-read (v2 only)
|
+| write.merge.isolation-level | serializable
| Isolation level for merge commands: serializable or snapshot
|
Review Comment:
Sure this is done (my editor got a little aggressive)
##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -143,6 +143,10 @@ private TableProperties() {
public static final String DELETE_PARQUET_PAGE_SIZE_BYTES =
"write.delete.parquet.page-size-bytes";
public static final int PARQUET_PAGE_SIZE_BYTES_DEFAULT = 1024 * 1024; // 1
MB
+ public static final String PARQUET_PAGE_ROW_COUNT_LIMIT =
"write.parquet.page-row-count-limit";
Review Comment:
This is done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]