nsivabalan commented on code in PR #7612:
URL: https://github.com/apache/hudi/pull/7612#discussion_r1068908725


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java:
##########
@@ -358,23 +364,27 @@ public static String createNewFileId(String idPfx, int 
id) {
    * Get the file extension from the log file.
    */
   public static String getFileExtensionFromLog(Path logPath) {
-    Matcher matcher = LOG_FILE_PATTERN.matcher(logPath.getName());
+    boolean isArchivedLog = logPath.getName().contains(ARCHIVED_LOG_PREFIX);
+    Matcher matcher =  isArchivedLog ? 
ARCHIVED_LOG_FILE_PATTERN.matcher(logPath.getName()) :
+        LOG_FILE_PATTERN.matcher(logPath.getName());
     if (!matcher.find()) {
       throw new InvalidHoodiePathException(logPath, "LogFile");
     }
-    return matcher.group(3);
+    return isArchivedLog ? ARCHIVE_STR : matcher.group(3);
   }
 
   /**
    * Get the first part of the file name in the log file. That will be the 
fileId. Log file do not have instantTime in
    * the file name.
    */
   public static String getFileIdFromLogPath(Path path) {
-    Matcher matcher = LOG_FILE_PATTERN.matcher(path.getName());
+    boolean isArchivedLog = path.getName().contains(ARCHIVED_LOG_PREFIX);
+    Matcher matcher =  isArchivedLog ? 
ARCHIVED_LOG_FILE_PATTERN.matcher(path.getName())
+        : LOG_FILE_PATTERN.matcher(path.getName());
     if (!matcher.find()) {
       throw new InvalidHoodiePathException(path, "LogFile");
     }
-    return matcher.group(1);
+    return isArchivedLog ? COMMITS_STR : matcher.group(1);

Review Comment:
   archived commits has a static prefix (".commits.archived") and so felt its 
better to have a separate regex rather than using one regex to work for both as 
it might be complex. I prefer this way so we know exactly whats the expected 
format in each case. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to