omarsmak commented on code in PR #945:
URL: https://github.com/apache/polaris/pull/945#discussion_r1942444745
##########
polaris-core/src/main/resources/schemas/policies/system/data-compaction/2025-02-03.json:
##########
@@ -0,0 +1,41 @@
+{
+ "license": "Licensed under the Apache License, Version 2.0
(http://www.apache.org/licenses/LICENSE-2.0)",
+ "$id":
"https://polaris.apache.org/schemas/policies/system/data-compaction/2025-02-03.json",
+ "title": "Data Compaction Policy",
+ "description": "Inheritable Polaris policy schema for Iceberg table data
compaction.",
+ "type": "object",
+ "properties": {
+ "version": {
+ "type": "string",
+ "const": "2025-02-03",
+ "description": "Schema version."
+ },
+ "enable": {
+ "type": "boolean",
+ "description": "Enable or disable data compaction."
+ },
+ "target_file_size_bytes": {
Review Comment:
I still have reservations about including `target_file_size_bytes` in the
main schema due to the potential issues that I highlighted in the design doc
earlier. Here, I’d like to reiterate my reasoning:
1. **Iceberg’s Source-of-Truth Property**:
Iceberg already exposes a `write.target-file-size-bytes` property, which is
the most appropriate configuration to define target file size across
operations. Engines are expected to comply with this property as the source of
truth.
By introducing `target_file_size_bytes` in Polaris as a separate Policy
configuration, we risk creating disparity between the two settings.
2. **Conflicting Policies Between Polaris and Iceberg**:
If a user sets a different value for `target_file_size_bytes` in Polaris
(compared to `write.target-file-size-bytes` in Iceberg), it introduces the
following risks:
- In cases where files are compacted manually (i.e., not through TMS),
the manual operation would likely respect Iceberg's
`write.target-file-size-bytes` setting.
- This mismatch could result in unintended behavior, such as a complete
rewrite of the table (critical for large tables with TBs of data).
I understand the potential reasons for adding this configuration in Polaris.
Given how the `config` map is fully optional and extensible, maybe it might
make sense to rely on this `config` field instead to set the target file size
there and allow the engine to interpret and manage its policies as needed.
Wdyt?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]