anmolnar commented on code in PR #7437:
URL: https://github.com/apache/hbase/pull/7437#discussion_r2491973658
##########
hbase-common/src/main/java/org/apache/hadoop/hbase/util/CommonFSUtils.java:
##########
@@ -745,5 +745,60 @@ public StreamLacksCapabilityException(String message,
Throwable cause) {
public StreamLacksCapabilityException(String message) {
super(message);
}
+
+ }
+
+ /**
+ * Recreates a file.
+ * <p>
+ * This method first deletes the file at the given path. If the deletion
fails and the file still
+ * exists, an IOException is thrown. After a successful deletion, it creates
a new file with the
+ * specified permissions and overwrite flag.
+ * @param fs The FileSystem
+ * @param path The file path to recreate
+ * @param recursive If true, delete recursively (for a directory)
+ * @param perm The FsPermission to apply to the new file
+ * @param overwrite If true, overwrite an existing file (though it should be
deleted by this
Review Comment:
You're trying to recreate a file. This should be always `true`.
##########
hbase-common/src/main/java/org/apache/hadoop/hbase/util/CommonFSUtils.java:
##########
@@ -745,5 +745,60 @@ public StreamLacksCapabilityException(String message,
Throwable cause) {
public StreamLacksCapabilityException(String message) {
super(message);
}
+
+ }
+
+ /**
+ * Recreates a file.
+ * <p>
+ * This method first deletes the file at the given path. If the deletion
fails and the file still
+ * exists, an IOException is thrown. After a successful deletion, it creates
a new file with the
+ * specified permissions and overwrite flag.
+ * @param fs The FileSystem
+ * @param path The file path to recreate
+ * @param recursive If true, delete recursively (for a directory)
+ * @param perm The FsPermission to apply to the new file
+ * @param overwrite If true, overwrite an existing file (though it should be
deleted by this
+ * point)
+ * @return The FSDataOutputStream for the newly created file
+ * @throws IOException If the deletion fails and the file still exists, or
if the creation fails.
+ */
+ public static FSDataOutputStream recreate(FileSystem fs, Path path, final
boolean recursive,
+ FsPermission perm, boolean overwrite) throws IOException {
+ LOG.debug("Attempting to delete path {} (recursive={}) for recreation.",
path, recursive);
+ try {
+ if (!fs.delete(path, recursive) && fs.exists(path)) {
+ // fs.delete returned false AND the file is still there. This is a
failure.
+ throw new IOException("Failed to delete " + path + " for recreation. "
+ + "delete() returned false and path still exists.");
+ }
+ } catch (IOException e) {
Review Comment:
Why do you need this all complicated logic? Delete either failed or
successful, right?
##########
hbase-common/src/main/java/org/apache/hadoop/hbase/util/CommonFSUtils.java:
##########
@@ -745,5 +745,60 @@ public StreamLacksCapabilityException(String message,
Throwable cause) {
public StreamLacksCapabilityException(String message) {
super(message);
}
+
+ }
+
+ /**
+ * Recreates a file.
+ * <p>
+ * This method first deletes the file at the given path. If the deletion
fails and the file still
+ * exists, an IOException is thrown. After a successful deletion, it creates
a new file with the
+ * specified permissions and overwrite flag.
+ * @param fs The FileSystem
+ * @param path The file path to recreate
+ * @param recursive If true, delete recursively (for a directory)
Review Comment:
This is about a single file. 'Recursive' doesn't make sense to me here.
##########
hbase-common/src/main/java/org/apache/hadoop/hbase/util/CommonFSUtils.java:
##########
@@ -745,5 +745,60 @@ public StreamLacksCapabilityException(String message,
Throwable cause) {
public StreamLacksCapabilityException(String message) {
super(message);
}
+
+ }
+
+ /**
+ * Recreates a file.
+ * <p>
+ * This method first deletes the file at the given path. If the deletion
fails and the file still
+ * exists, an IOException is thrown. After a successful deletion, it creates
a new file with the
+ * specified permissions and overwrite flag.
+ * @param fs The FileSystem
+ * @param path The file path to recreate
+ * @param recursive If true, delete recursively (for a directory)
+ * @param perm The FsPermission to apply to the new file
+ * @param overwrite If true, overwrite an existing file (though it should be
deleted by this
+ * point)
+ * @return The FSDataOutputStream for the newly created file
+ * @throws IOException If the deletion fails and the file still exists, or
if the creation fails.
+ */
+ public static FSDataOutputStream recreate(FileSystem fs, Path path, final
boolean recursive,
+ FsPermission perm, boolean overwrite) throws IOException {
+ LOG.debug("Attempting to delete path {} (recursive={}) for recreation.",
path, recursive);
+ try {
+ if (!fs.delete(path, recursive) && fs.exists(path)) {
Review Comment:
`fs.exists()` is redundant. `fs.delete()` should give your a clear result
about whether the delete operation was successful or not.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java:
##########
@@ -405,10 +409,70 @@ public void
preCleanupBulkLoad(ObserverContext<RegionCoprocessorEnvironment> ctx
public void onConfigurationChange(Configuration conf) {
boolean maybeUpdatedConfValue =
conf.getBoolean(HConstants.HBASE_GLOBAL_READONLY_ENABLED_KEY,
HConstants.HBASE_GLOBAL_READONLY_ENABLED_DEFAULT);
+ try {
+ recreateActiveClusterIdFile(conf);
+ } catch (IOException e) {
+ LOG.error("Encountered I/O exception while recreating the active cluster
id file {} ",
+ e.getMessage());
+ }
if (this.globalReadOnlyEnabled != maybeUpdatedConfValue) {
this.globalReadOnlyEnabled = maybeUpdatedConfValue;
LOG.info("Config {} has been dynamically changed to {}",
HConstants.HBASE_GLOBAL_READONLY_ENABLED_KEY,
this.globalReadOnlyEnabled);
}
}
+
+ /**
+ * Recreates the active cluster ID file with retry logic. This method will
attempt to recreate the
+ * file, retrying on IOException until the configured timeout is exceeded.
+ * @param conf The configuration
+ * @throws IOException if the file recreation fails after all retries.
+ */
+ private void recreateActiveClusterIdFile(Configuration conf) throws
IOException {
+ Path rootDir = CommonFSUtils.getRootDir(conf);
+ FileSystem fs = rootDir.getFileSystem(conf);
+ FsPermission perms =
+ CommonFSUtils.getFilePermissions(fs, fs.getConf(),
HConstants.DATA_FILE_UMASK_KEY);
+ Path activeClusterIdFilePath = new Path(rootDir,
HConstants.ACTIVE_CLUSTER_SUFFIX_FILE_NAME);
+
+ long timeout =
conf.getLong(HConstants.HBASE_ACTIVE_CLUSTERIDFILE_RECREATE_TIMEOUT_KEY,
+ HConstants.HBASE_ACTIVE_CLUSTERIDFILE__RECREATE_TIMEOUT_DEFAULT);
+ long retryInterval =
HConstants.HBASE_ACTIVE_CLUSTERIDFILE__RECREATE_RETRY_INTERVAL_MS;
+ long startTime = System.currentTimeMillis();
+
+ LOG.info("Attempting to recreate active cluster ID file: {}",
activeClusterIdFilePath);
+
+ while (true) {
+ try {
+ // Attempt to recreate the file
+ CommonFSUtils.recreate(fs, activeClusterIdFilePath, true, perms,
false);
+
+ // Success
+ LOG.info("Successfully recreated active cluster ID file: {}",
activeClusterIdFilePath);
+ return;
+ } catch (IOException e) {
+ long elapsedTime = System.currentTimeMillis() - startTime;
+ if (elapsedTime >= timeout) {
+ // Timeout exceeded. Throw the last exception, wrapping it for
context.
+ throw new IOException("Failed to recreate active cluster ID file "
+ + activeClusterIdFilePath + " after retrying for " + elapsedTime +
"ms", e);
+ }
+
+ // Not timed out, log and sleep before retrying
+ LOG.warn(
+ "Failed to recreate active cluster ID file: {}. Retrying in {}ms... "
Review Comment:
I don't think you need to retry to create the file. Try it only once and log
error if it failed. There's no such thing that a filesystem is unavailable for
some reason, but will become available at some point and we wait for it. It
should be available at all times or something is wrong.
##########
hbase-common/src/main/java/org/apache/hadoop/hbase/util/CommonFSUtils.java:
##########
@@ -745,5 +745,60 @@ public StreamLacksCapabilityException(String message,
Throwable cause) {
public StreamLacksCapabilityException(String message) {
super(message);
}
+
+ }
+
+ /**
+ * Recreates a file.
+ * <p>
+ * This method first deletes the file at the given path. If the deletion
fails and the file still
+ * exists, an IOException is thrown. After a successful deletion, it creates
a new file with the
+ * specified permissions and overwrite flag.
+ * @param fs The FileSystem
+ * @param path The file path to recreate
+ * @param recursive If true, delete recursively (for a directory)
+ * @param perm The FsPermission to apply to the new file
+ * @param overwrite If true, overwrite an existing file (though it should be
deleted by this
+ * point)
+ * @return The FSDataOutputStream for the newly created file
+ * @throws IOException If the deletion fails and the file still exists, or
if the creation fails.
+ */
+ public static FSDataOutputStream recreate(FileSystem fs, Path path, final
boolean recursive,
+ FsPermission perm, boolean overwrite) throws IOException {
+ LOG.debug("Attempting to delete path {} (recursive={}) for recreation.",
path, recursive);
+ try {
+ if (!fs.delete(path, recursive) && fs.exists(path)) {
+ // fs.delete returned false AND the file is still there. This is a
failure.
+ throw new IOException("Failed to delete " + path + " for recreation. "
+ + "delete() returned false and path still exists.");
+ }
+ } catch (IOException e) {
+ // The delete operation itself threw an exception.
+ // Check if the file is still there before propagating.
+ boolean stillExists = false;
+ try {
+ stillExists = fs.exists(path);
+ } catch (IOException ioe) {
+ // Failed to even check existence. Add this as suppressed.
+ e.addSuppressed(ioe);
+ throw new IOException(
+ "Failed to delete " + path + " and could not " + "verify
post-deletion state.", e);
+ }
+
+ if (stillExists) {
+ // Deletion threw, and file is still there. This is a hard failure.
+ throw new IOException(
+ "Failed to delete " + path + " for recreation. " + "Operation threw:
" + e.getMessage(),
+ e);
+ }
+ LOG.warn("Delete call for {} threw exception, "
+ + "but file no longer exists. Proceeding with creation.", path, e);
+ }
+
+ LOG.debug("Path {} is clear. Proceeding with creation with perm={}
overwrite={}", path, perm,
+ overwrite);
+
+ // Create the new file with the specified permissions
+ return create(fs, path, perm, overwrite);
Review Comment:
You should not recreate the file when read-only mode is turned on. In that
case you only need to delete the file and let the active cluster a new. How do
cover that scenario?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]