wgtmac commented on code in PR #652:
URL: https://github.com/apache/iceberg-cpp/pull/652#discussion_r3231912961


##########
src/iceberg/manifest/manifest_filter_manager.h:
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#pragma once
+
+/// \file iceberg/manifest/manifest_filter_manager.h
+/// Filters an existing snapshot's manifest list, marking data files as DELETED
+/// or EXISTING based on row-filter expressions, exact path deletes, and 
partition drops.
+
+#include <functional>
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <unordered_set>
+#include <vector>
+
+#include "iceberg/expression/inclusive_metrics_evaluator.h"
+#include "iceberg/expression/manifest_evaluator.h"
+#include "iceberg/iceberg_export.h"
+#include "iceberg/manifest/manifest_list.h"
+#include "iceberg/manifest/manifest_writer.h"
+#include "iceberg/result.h"
+#include "iceberg/type_fwd.h"
+#include "iceberg/util/partition_value_util.h"
+
+namespace iceberg {
+
+/// \brief Factory type for creating ManifestWriter instances during 
filtering/merging.
+///
+/// The factory receives the partition spec ID (to look up the spec) and the 
manifest
+/// content type, and returns a new ManifestWriter ready for writing.  The 
caller
+/// (i.e. MergingSnapshotUpdate in PR2) captures metadata, FileIO, and 
snapshot ID
+/// inside the lambda.
+using ManifestWriterFactory = 
std::function<Result<std::unique_ptr<ManifestWriter>>(
+    int32_t spec_id, ManifestContent content)>;
+
+/// \brief Filters an existing snapshot's manifest list.
+///
+/// The manager accumulates delete conditions incrementally, then applies them 
all
+/// at once in a single FilterManifests() call.  Manifests that contain no 
deleted
+/// entries are returned unchanged (no I/O).  Manifests that do contain deleted
+/// entries are rewritten with those entries marked DELETED.
+///
+/// The manager is content-agnostic: pass ManifestContent::kData to process 
data
+/// manifests, or ManifestContent::kDeletes to process delete manifests.
+///
+/// \note This class is non-copyable and non-movable.
+class ICEBERG_EXPORT ManifestFilterManager {
+ public:
+  ManifestFilterManager(ManifestContent content, std::shared_ptr<FileIO> 
file_io);
+
+  ManifestFilterManager(const ManifestFilterManager&) = delete;
+  ManifestFilterManager& operator=(const ManifestFilterManager&) = delete;
+
+  /// \brief Register a row-filter expression.
+  ///
+  /// Any manifest entry whose column metrics indicate the file may satisfy the
+  /// expression will be marked DELETED.
+  ///
+  /// \param expr The expression to match files against
+  /// \param case_sensitive Whether field name matching is case-sensitive
+  void DeleteByRowFilter(std::shared_ptr<Expression> expr, bool case_sensitive 
= true);
+
+  /// \brief Register an exact file path for deletion.
+  ///
+  /// Any manifest entry whose file_path matches this path will be marked 
DELETED.
+  ///
+  /// \param path The exact file path to delete
+  void DeleteFile(std::string_view path);

Review Comment:
   Could we keep the Java file-object delete path in the manager API instead of 
only accepting paths? Java ManifestFilterManager supports delete(F file) in 
addition to delete(CharSequence path); 
RewriteFiles.deleteFile(DataFile/DeleteFile) and 
StreamingDelete.deleteFile(DataFile) use that path, and the manager then 
retains file identity/manifest references for filesToBeDeleted(), 
buildSummary(), duplicate-delete accounting, and delete-manifest DV cleanup. 
With only DeleteFile(string_view), later layers have to degrade object deletes 
to paths and lose the state Java relies on. It would be good to add 
DataFile/DeleteFile object-delete coverage alongside the current path-only 
tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to