jackye1995 commented on a change in pull request #2255:
URL: https://github.com/apache/iceberg/pull/2255#discussion_r580688034



##########
File path: api/src/main/java/org/apache/iceberg/actions/MigrateTable.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+
+/**
+ * An action that migrates an existing table to Iceberg.
+ */
+public interface MigrateTable extends Action<MigrateTable.Result> {
+  /**
+   * Adds additional properties to the newly created Iceberg table. Any 
properties with
+   * the same key name will be overwritten.
+   *
+   * @param properties a map of properties to be included
+   * @return this for method chaining
+   */
+  MigrateTable withProperties(Map<String, String> properties);
+
+  /**
+   * Adds an additional property to the newly created Iceberg table. Any 
properties
+   * with the same key name will be overwritten.
+   *
+   * @param key   the key of the property to add
+   * @param value the value of the property to add
+   * @return this for method chaining
+   */
+  MigrateTable withProperty(String key, String value);

Review comment:
       nit: based on your description, I think this method should be named 
`addProperty`, `withXXX` usually indicates overwriting the value.

##########
File path: api/src/main/java/org/apache/iceberg/actions/RemoveOrphanFiles.java
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.function.Consumer;
+
+/**
+ * An action that removes orphan files in a table.
+ * <p>
+ * A metadata or data file is considered orphan if it is not reachable by any 
valid snapshot.
+ * The set of actual files is built by listing the underlying storage which 
makes this operation
+ * expensive.
+ */
+public interface RemoveOrphanFiles extends Action<RemoveOrphanFiles.Result> {
+  /**
+   * Passes a location which should be scanned for orphan files.
+   * <p>
+   * If not set, the root table location will be scanned potentially removing 
both orphan data and
+   * metadata files.
+   *
+   * @param location the location where to look for orphan files
+   * @return this for method chaining
+   */
+  RemoveOrphanFiles location(String location);
+
+  /**
+   * Removes orphan files only if they are older than the given timestamp.
+   * <p>
+   * This is a safety measure to avoid removing files that are being added to 
the table.
+   * For example, there may be a concurrent operation adding new files while 
the action searches
+   * for orphan files. New files may not be referenced by the metadata yet but 
they are not orphan.
+   * <p>
+   * If not set, defaults to a timestamp 3 days ago.
+   *
+   * @param olderThanTimestamp a long timestamp, as returned by {@link 
System#currentTimeMillis()}
+   * @return this for method chaining
+   */
+  RemoveOrphanFiles olderThan(long olderThanTimestamp);
+
+  /**
+   * Passes an alternative delete implementation that will be used for orphan 
files.
+   * <p>
+   * This method allows to customize the delete func. For example, one may set 
a custom delete func
+   * and collect all orphan files into a set instead of physically removing 
them.
+   * <p>
+   * If not called, uses the table's {@link org.apache.iceberg.io.FileIO io} 
implementation.
+   *
+   * @param deleteFunc a function that will be called to delete files
+   * @return this for method chaining
+   */
+  RemoveOrphanFiles deleteWith(Consumer<String> deleteFunc);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns locations of orphan files.
+     */
+    Iterable<String> orphanFileLocations();

Review comment:
       > Well, I think it makes sense to use `ClosableIterable` in this 
particular case. At the same time, other places in this PR could continue to 
use `Iterable` as it is unlikely they will benefit from lazy materialization.
   
   Why? Cases like `Iterable<ManifestFile> rewrittenManifests()` can also 
benefit from using `CloseableIterable`. I feel if we want to do it for one, we 
should do it for all, because we cannot just assume one result might need to 
close resource, the other does not. If an engine does it for one action, it is 
likely that it will have this behavior for all actions.
   
   With that being said, I agree that using `CloseableIterable` would make the 
interface harder to use. Why not just use `Iterable`, and let `Result` be the 
object to implement `close()`? If one engine has a closable iterable, it can be 
closed there.

##########
File path: api/src/main/java/org/apache/iceberg/actions/SnapshotTable.java
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+
+/**
+ * An action that creates an independent snapshot of an existing table.
+ */
+public interface SnapshotTable extends Action<SnapshotTable.Result> {
+  /**
+   * Sets the table location.
+   *
+   * @param location a table location
+   * @return this for method chaining
+   */
+  SnapshotTable withLocation(String location);
+
+  /**
+   * Adds additional properties to the newly created Iceberg table. Any 
properties with
+   * the same key name will be overwritten.
+   *
+   * @param properties a map of properties to be included
+   * @return this for method chaining
+   */
+  SnapshotTable withProperties(Map<String, String> properties);
+
+  /**
+   * Adds an additional property to the newly created Iceberg table. Any 
properties
+   * with the same key name will be overwritten.
+   *
+   * @param key   the key of the property to add
+   * @param value the value of the property to add
+   * @return this for method chaining
+   */
+  SnapshotTable withProperty(String key, String value);

Review comment:
       nit: same comment as before, can be `addProperty`

##########
File path: api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.Table;
+
+/**
+ * An API that should be implemented by query engine integrations for 
providing actions.
+ */
+public interface ActionsProvider {
+
+  /**
+   * Instantiates an action to snapshot an existing table.
+   */
+  default SnapshotTable snapshotTable(String sourceTableIdent, String 
destTableIdent) {
+    throw new UnsupportedOperationException(this.getClass().getName() + " does 
not implement snapshotTable");
+  }
+
+  /**
+   * Instantiates an action to migrate an existing table to Iceberg.
+   */
+  default MigrateTable migrateTable(String tableIdent) {

Review comment:
       I am not sure why `TableIdentifier` is in the catalog module, but here I 
agree that it seems the most flexible to use a string. For Spark it in the end 
uses `spark.sessionState().sqlParser().parseMultipartIdentifier(string)` anyway.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to