yihua opened a new pull request, #8108:
URL: https://github.com/apache/hudi/pull/8108

   ### Change Logs
   
   This PR makes the Metadata Table Validator (`HoodieMetadataTableValidator`) 
to skip the validation of the metadata table if the data table does not exist 
based on the provided base path, to avoid false positives.  A warning message 
is still printed:
   ```
   23/03/06 17:59:53 WARN HoodieMetadataTableValidator: The Hudi data table is 
not found: [file:/Users/ethan/Work/tmp/script/123/test_table]. Skipping the 
validation of the metadata table.
   org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in 
path file:/Users/ethan/Work/tmp/script/123/test_table/.hoodie
         at 
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:57)
         at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:137)
         at 
org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:689)
         at 
org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:81)
         at 
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:770)
         at 
org.apache.hudi.utilities.HoodieMetadataTableValidator.<init>(HoodieMetadataTableValidator.java:180)
         at 
org.apache.hudi.utilities.HoodieMetadataTableValidator.main(HoodieMetadataTableValidator.java:347)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:498)
         at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
         at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
         at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
         at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.io.FileNotFoundException: File 
file:/Users/ethan/Work/tmp/script/123/test_table/.hoodie does not exist
         at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
         at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
         at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
         at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
         at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:410)
         at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
         at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:404)
         at 
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51)
         ... 18 more
   ```
   
   ### Impact
   
   Avoids failing the metadata table validation if the data table does not 
exist.  Tested locally that the behavior is expected.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to