nicholasxu opened a new issue, #10458:
URL: https://github.com/apache/hudi/issues/10458

   **Describe the problem you faced**
   
   When I run Change Data Capture Query in flink quick start on hudi official 
website, Got 'java.lang.IllegalArgumentException: Can not create a Path from an 
empty string' which indeed caused by empty baseFile name. 
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Build hudi 0.14.1, mvn clean package 
-Pflink1.17,spark3.2,scala-2.12,flink-bundle-shade-hive3 
'-Dmaven.test.skip=true' 
   
   2.Then put hudi-flink1.17-bundle-0.14.1.jar into flink 1.17 lib
   
   3.Run 'Change Data Capture Query' in Flink quick start on Hudi official 
website, as follows:
   
   set sql-client.execution.result-mode = tableau;
   
   CREATE TABLE hudi_table(
       ts BIGINT,
       uuid VARCHAR(40) PRIMARY KEY NOT ENFORCED,
       rider VARCHAR(20),
       driver VARCHAR(20),
       fare DOUBLE,
       city VARCHAR(20)
   )
   PARTITIONED BY (`city`)
   WITH (
     'connector' = 'hudi',
     'path' = 'cosn://bigdata-emr-test-xxxx/tmp/hudi_table',
     'table.type' = 'MERGE_ON_READ',
     'changelog.enabled' = 'true',  -- this option enable the change log enabled
     'cdc.enabled' = 'true' -- this option enable the cdc log enabled
   );
   -- insert data using values
   INSERT INTO hudi_table
   VALUES
   
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
   
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
 ,'san_francisco'),
   
(1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
 ,'san_francisco'),
   
(1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
   
(1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'),
   
(1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
 ,'sao_paulo'),
   
(1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
 ,'chennai'),
   
(1695115999911,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
   SET 'execution.runtime-mode' = 'batch';
   UPDATE hudi_table SET fare = 25.0 WHERE uuid = 
'334e26e9-8355-45cc-97c6-c31daf0df330';
   -- Query the table in stream mode in another shell to see change logs
   SET 'execution.runtime-mode' = 'streaming';
   select * from hudi_table/*+ OPTIONS('read.streaming.enabled'='true')*/;
   
   when I ran last line 'select * from hudi_table', got the exception
   
   **Expected behavior**
   The error log is as follows:
   
![image](https://github.com/apache/hudi/assets/12593964/1ef264e2-c6db-4e1a-810a-c3dc1cf78c5b)
   
   I guess there is something wrong with the empty path, so I add debug log in 
hudi to trace.
   
![image](https://github.com/apache/hudi/assets/12593964/d10e8a41-5c8e-4405-ac72-52390725fd73)
   
   Then get logs as follows:
   
![image](https://github.com/apache/hudi/assets/12593964/d610e97e-ec5f-4605-8de9-e3ccad061dec)
   
   Then try to trace the error causes:
   
![image](https://github.com/apache/hudi/assets/12593964/6ab35ac7-df46-4984-94f3-d46aab695bfa)
    
   Look hudi metadata info as follows:
   
![image](https://github.com/apache/hudi/assets/12593964/2e124375-910e-4751-ab6e-54c3b90a9892)
   
   I don not know why baseFile name is empty,  your help is appreciative.
   
   **Environment Description**
   
   * Hudi version : 0.14.1
   
   * Spark version : 3.2.2
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.2.2
   
   * Storage (HDFS/S3/GCS..) : COS on Tencent Cloud
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   2024-01-08 17:07:28,934 WARN  org.apache.flink.runtime.taskmanager.Task      
              [] - Source: split_monitor(table=[hudi_table_1], fields=[ts, 
uuid, rider, driver, fare, city]) (1/1)#0 
(3192a837c71d3aecaa11be4a8fadb46c_775ad0e5e90e3004c8cb1dade74c44c8_0_0) 
switched from RUNNING to FAILED with failure cause:
   org.apache.hudi.exception.HoodieException: Fail to get the dependent file 
slice for a log file
        at 
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.getDependentFileSliceForLogFile(HoodieCDCExtractor.java:345)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.parseWriteStat(HoodieCDCExtractor.java:281)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.lambda$extractCDCFileSplits$1(HoodieCDCExtractor.java:133)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_322]
        at 
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.extractCDCFileSplits(HoodieCDCExtractor.java:128)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.source.IncrementalInputSplits.getCdcInputSplits(IncrementalInputSplits.java:462)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.source.IncrementalInputSplits.getIncInputSplits(IncrementalInputSplits.java:330)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.source.IncrementalInputSplits.inputSplits(IncrementalInputSplits.java:311)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.source.StreamReadMonitoringFunction.monitorDirAndForwardSplits(StreamReadMonitoringFunction.java:215)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.source.StreamReadMonitoringFunction.run(StreamReadMonitoringFunction.java:187)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
 ~[flink-dist-1.17.2.jar:1.17.2]
        at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67) 
~[flink-dist-1.17.2.jar:1.17.2]
        at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
 ~[flink-dist-1.17.2.jar:1.17.2]
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:172) 
~[hadoop-common-3.2.2.jar:?]
        at org.apache.hadoop.fs.Path.<init>(Path.java:184) 
~[hadoop-common-3.2.2.jar:?]
        at org.apache.hadoop.fs.Path.<init>(Path.java:129) 
~[hadoop-common-3.2.2.jar:?]
        at 
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.getDependentFileSliceForLogFile(HoodieCDCExtractor.java:336)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        ... 12 more
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to