xushiyan commented on code in PR #5791:
URL: https://github.com/apache/hudi/pull/5791#discussion_r891957547


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala:
##########
@@ -509,12 +510,22 @@ abstract class HoodieBaseRelation(val sqlContext: 
SQLContext,
     StructType(dataStructSchema.filterNot(f => 
partitionColumns.contains(f.name)))
 
   private def isSchemaEvolutionEnabled = {
+    // Auto set schema evolution
+    // Wrapped as a function, this function is triggered only when called, 
minimize the time cosumption.
+    def detectSchemaEvolution(): Boolean = {
+      val result = new 
FileBasedInternalSchemaStorageManager(metaClient).isValidHistorySchemaExist

Review Comment:
   is it possible to avoid creating a new FileBasedInternalSchemaStorageManager 
every check? the actual info is retrieved by the existing metaclient, so 
`isValidHistorySchemaExist` can be a static util taking the metaclient?. This 
would involve more refactoring work so it's optional here.



##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/io/FileBasedInternalSchemaStorageManager.java:
##########
@@ -131,6 +131,27 @@ private List<String> getValidInstants() {
         .filterCompletedInstants().getInstants().map(f -> 
f.getTimestamp()).collect(Collectors.toList());
   }
 
+  /**
+   * Return whether an available historySchema file exist in schema folder or 
not.
+   */
+  public boolean isValidHistorySchemaExist() {
+    try {
+      List<String> validateCommits = getValidInstants();
+      FileSystem fs = FSUtils.getFs(baseSchemaPath.toString(), conf);
+      if (fs.exists(baseSchemaPath)) {
+        List<String> validaSchemaFiles = 
Arrays.stream(fs.listStatus(baseSchemaPath))
+            .filter(f -> f.isFile() && 
f.getPath().getName().endsWith(SCHEMA_COMMIT_ACTION))
+            .map(file -> file.getPath().getName()).filter(f -> 
validateCommits.contains(f.split("\\.")[0])).sorted().collect(Collectors.toList());

Review Comment:
   if every flag check `isSchemaEvolutionEnabled` invokes this, any performance 
concern? Not sure if the usability improvement can justify the perf cost or not.



##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/io/FileBasedInternalSchemaStorageManager.java:
##########
@@ -131,6 +131,27 @@ private List<String> getValidInstants() {
         .filterCompletedInstants().getInstants().map(f -> 
f.getTimestamp()).collect(Collectors.toList());
   }
 
+  /**
+   * Return whether an available historySchema file exist in schema folder or 
not.
+   */
+  public boolean isValidHistorySchemaExist() {
+    try {
+      List<String> validateCommits = getValidInstants();
+      FileSystem fs = FSUtils.getFs(baseSchemaPath.toString(), conf);
+      if (fs.exists(baseSchemaPath)) {
+        List<String> validaSchemaFiles = 
Arrays.stream(fs.listStatus(baseSchemaPath))
+            .filter(f -> f.isFile() && 
f.getPath().getName().endsWith(SCHEMA_COMMIT_ACTION))
+            .map(file -> file.getPath().getName()).filter(f -> 
validateCommits.contains(f.split("\\.")[0])).sorted().collect(Collectors.toList());

Review Comment:
   pls try simplifying this method chain to improve readability.. you can 
leverage `Stream::anyMatch()` ? also better to illustrate in the javadoc with 
an example showing file names



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to