wgtmac commented on code in PR #682:
URL: https://github.com/apache/iceberg-cpp/pull/682#discussion_r3322779116
##########
src/iceberg/util/snapshot_util.cc:
##########
@@ -375,7 +375,7 @@ Result<std::shared_ptr<Schema>>
SnapshotUtil::SchemaFor(const Table& table,
const auto& metadata = table.metadata();
auto it = metadata->refs.find(ref);
- if (it == metadata->refs.cend() || it->second->type() ==
SnapshotRefType::kBranch) {
Review Comment:
Why here and below are changed? I checked that they still exist in the Java
implementation.
##########
src/iceberg/snapshot.h:
##########
@@ -359,6 +368,10 @@ class ICEBERG_EXPORT SnapshotSummaryBuilder {
int32_t max_changed_partitions_for_summaries_{0};
int64_t deleted_duplicate_files_{0};
bool trust_partition_metrics_{true};
+ bool manifests_counts_set_{false};
Review Comment:
Java does not populate these fields in the summary builder and instead
directly call `Set(const std::string& property, const std::string& value)` to
set these values. Should we follow a similar pattern as these fields are not
commonly used?
##########
src/iceberg/snapshot.cc:
##########
@@ -491,6 +503,10 @@ void SnapshotSummaryBuilder::Merge(const
SnapshotSummaryBuilder& other) {
}
deleted_duplicate_files_ += other.deleted_duplicate_files_;
+ // Manifest counts (manifests_counts_set_ / manifests_created_ /
manifests_kept_ /
Review Comment:
If we remove these fields, then these comments should be removed too.
##########
src/iceberg/manifest/manifest_filter_manager.h:
##########
@@ -214,11 +273,22 @@ class ICEBERG_EXPORT ManifestFilterManager {
std::shared_ptr<Expression> delete_expr_;
std::unordered_set<std::string> delete_paths_;
+ std::unordered_set<DeleteFileKey, DeleteFileKeyHash> delete_file_keys_;
DataFileSet delete_files_;
+ std::vector<std::shared_ptr<DataFile>> deleted_files_;
+ std::unordered_set<DeleteFileKey, DeleteFileKeyHash> deleted_file_keys_;
PartitionSet drop_partitions_;
bool fail_missing_delete_paths_{false};
bool fail_any_delete_{false};
bool case_sensitive_{true};
+ int32_t duplicate_deletes_count_{0};
+
Review Comment:
```suggestion
```
##########
src/iceberg/manifest/manifest_filter_manager.h:
##########
@@ -214,11 +273,22 @@ class ICEBERG_EXPORT ManifestFilterManager {
std::shared_ptr<Expression> delete_expr_;
std::unordered_set<std::string> delete_paths_;
+ std::unordered_set<DeleteFileKey, DeleteFileKeyHash> delete_file_keys_;
Review Comment:
Why not using `DataFileSet`? Do `content_offset` and `content_size_in_bytes`
really matter in this case? I don't see Java ManifestFilterManager has this
special handling.
##########
src/iceberg/update/snapshot_update.h:
##########
@@ -122,6 +122,11 @@ class ICEBERG_EXPORT SnapshotUpdate : public PendingUpdate
{
Status Finalize(Result<const TableMetadata*> commit_result) override;
protected:
+ struct DeleteManifestEntry {
Review Comment:
Do we want to add a dedicated `DataFileWrapper` structure to equip it with
optional fields like `data_sequence_number` and `file_sequence_number`? It can
be relocated to `manifest_entry.h` as `DeleteManifestEntry` looks confusing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]