aokolnychyi commented on a change in pull request #2415: URL: https://github.com/apache/iceberg/pull/2415#discussion_r617104211
########## File path: api/src/main/java/org/apache/iceberg/actions/DropTable.java ########## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.actions; + +import java.util.concurrent.ExecutorService; +import java.util.function.Consumer; + +/** + * An action that deletes data, manifest, manifest lists in a table. + * <p> + * Reuires query engine to distribute parts of the work. + */ +public interface DropTable extends Action<DropTable, DropTable.Result> { + + /** + * Passes an alternative delete implementation that will be used for manifests and data files. + * <p> + * + * @param deleteFunc a function that will be called to delete manifests and data files + * @return this for method chaining + */ + DropTable deleteWith(Consumer<String> deleteFunc); Review comment: I agree about the parallelism in general. Whether we do this on the driver or on executors, I think we will be limited by the underlying storage in most cases (maybe not all?). One of the points that @RussellSpitzer brought is the amount of data we will have to bring to the driver. By issuing deletes from executors after coalescing the partitions, we can avoid bringing data to the driver. Technically, `toLocalIterator` helps us to reduce the memory pressure but that's not ideal. The worry with coalescing partitions to have reasonable parallelism is that it may produce too large partitions anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
