nsivabalan commented on a change in pull request #1912: URL: https://github.com/apache/hudi/pull/1912#discussion_r465142508
########## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/TimedWaitOnAppearConsistencyGaurd.java ########## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.fs; + +import org.apache.hudi.common.util.ValidationUtils; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; + +import java.io.IOException; +import java.util.List; +import java.util.concurrent.TimeoutException; + +/** + * A consistency guard which sleeps for configured period of time only on APPEAR. It is a no-op for DISAPPEAR. + * This is specifically for S3A filesystem and here is the rational. + * This guard is used when deleting data files corresponding to marker files that needs to be deleted. + * There are two tricky cases that needs to be considered. Case 1 : A data file creation is eventually consistent and hence + * when issuing deletes, it may not be found. Case 2: a data file was never created in the first place since the process crashed. + * In S3A, GET and LIST are eventually consistent, and delete() implementation internally does a LIST/EXISTS. + * Prior to this patch, hudi was leveraging {@link FailSafeConsistencyGuard} which was doing the following to delete data files. + * Step1: wait for all files to appear with linear backoff. + * Step2: issue deletes + * Step3: wait for all files to disappear with linear backoff. + * Step1 and Step2 is handled by {@link FailSafeConsistencyGuard}. + * + * We are simplifying these steps with {@link TimedWaitOnAppearConsistencyGaurd}. + * Step1: Sleep for a configured threshold. + * Step2: issue deletes. + * + * With this, if any files that was created, should be available within configured threshold(eventual consistency). + * Delete() will return false if FileNotFound. So, both cases are taken care of this {@link ConsistencyGuard}. + */ +public class TimedWaitOnAppearConsistencyGaurd implements ConsistencyGuard { + + private static final Logger LOG = LogManager.getLogger(TimedWaitOnAppearConsistencyGaurd.class); + + private final ConsistencyGuardConfig consistencyGuardConfig; + + public TimedWaitOnAppearConsistencyGaurd(FileSystem fs, ConsistencyGuardConfig consistencyGuardConfig) { + this.consistencyGuardConfig = consistencyGuardConfig; + ValidationUtils.checkArgument(consistencyGuardConfig.isConsistencyCheckEnabled()); + } + + @Override + public void waitTillFileAppears(Path filePath) throws IOException, TimeoutException { + try { + Thread.sleep(consistencyGuardConfig.getInitialConsistencyCheckIntervalMs()); Review comment: I am repurposing "hoodie.consistency.check.initial_interval_ms" for this sleep time. Let me know if we need to introduce a new config for this. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
