xiarixiaoyao commented on code in PR #5791:
URL: https://github.com/apache/hudi/pull/5791#discussion_r892040473


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/io/FileBasedInternalSchemaStorageManager.java:
##########
@@ -131,6 +131,27 @@ private List<String> getValidInstants() {
         .filterCompletedInstants().getInstants().map(f -> 
f.getTimestamp()).collect(Collectors.toList());
   }
 
+  /**
+   * Return whether an available historySchema file exist in schema folder or 
not.
+   */
+  public boolean isValidHistorySchemaExist() {
+    try {
+      List<String> validateCommits = getValidInstants();
+      FileSystem fs = FSUtils.getFs(baseSchemaPath.toString(), conf);
+      if (fs.exists(baseSchemaPath)) {
+        List<String> validaSchemaFiles = 
Arrays.stream(fs.listStatus(baseSchemaPath))
+            .filter(f -> f.isFile() && 
f.getPath().getName().endsWith(SCHEMA_COMMIT_ACTION))
+            .map(file -> file.getPath().getName()).filter(f -> 
validateCommits.contains(f.split("\\.")[0])).sorted().collect(Collectors.toList());

Review Comment:
   good question
   1) if schema evolution happend,some schema files will be exists in schema 
folder.  we check the exist of those schema files to set schema evolution auto, 
this operation should be called once and then we set 
sparkSession.sessionState.conf.setConfString(DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.key,
 result.toString) to avoid repeated call this function.  see 
[HoodieBaseRelation.scala](https://github.com/apache/hudi/pull/5791/files#diff-b95f9369e8ae90c511e1cff0863c8207d61c9e3dc2345350552a74d3a068bd31)
 line 517
   2)  if no schema evolution happend, no schema files  exists in schema 
folder, this folder should be empty.  when we call fs.listStatus() for empty 
folder, this operation should be  very fast.
   
   finally:  There won't be many schema files schema folder, There are at most 
10 files in this directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to