nicholasxu opened a new issue, #10458:
URL: https://github.com/apache/hudi/issues/10458
**Describe the problem you faced**
When I run Change Data Capture Query in flink quick start on hudi official
website, Got 'java.lang.IllegalArgumentException: Can not create a Path from an
empty string' which indeed caused by empty baseFile name.
**To Reproduce**
Steps to reproduce the behavior:
1.Build hudi 0.14.1, mvn clean package
-Pflink1.17,spark3.2,scala-2.12,flink-bundle-shade-hive3
'-Dmaven.test.skip=true'
2.Then put hudi-flink1.17-bundle-0.14.1.jar into flink 1.17 lib
3.Run 'Change Data Capture Query' in Flink quick start on Hudi official
website, as follows:
set sql-client.execution.result-mode = tableau;
CREATE TABLE hudi_table(
ts BIGINT,
uuid VARCHAR(40) PRIMARY KEY NOT ENFORCED,
rider VARCHAR(20),
driver VARCHAR(20),
fare DOUBLE,
city VARCHAR(20)
)
PARTITIONED BY (`city`)
WITH (
'connector' = 'hudi',
'path' = 'cosn://bigdata-emr-test-xxxx/tmp/hudi_table',
'table.type' = 'MERGE_ON_READ',
'changelog.enabled' = 'true', -- this option enable the change log enabled
'cdc.enabled' = 'true' -- this option enable the cdc log enabled
);
-- insert data using values
INSERT INTO hudi_table
VALUES
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
,'san_francisco'),
(1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
,'san_francisco'),
(1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
(1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'),
(1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
,'sao_paulo'),
(1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
,'chennai'),
(1695115999911,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
SET 'execution.runtime-mode' = 'batch';
UPDATE hudi_table SET fare = 25.0 WHERE uuid =
'334e26e9-8355-45cc-97c6-c31daf0df330';
-- Query the table in stream mode in another shell to see change logs
SET 'execution.runtime-mode' = 'streaming';
select * from hudi_table/*+ OPTIONS('read.streaming.enabled'='true')*/;
when I ran last line 'select * from hudi_table', got the exception
**Expected behavior**
The error log is as follows:

I guess there is something wrong with the empty path, so I add debug log in
hudi to trace.

Then get logs as follows:

Then try to trace the error causes:

Look hudi metadata info as follows:

I don not know why baseFile name is empty, your help is appreciative.
**Environment Description**
* Hudi version : 0.14.1
* Spark version : 3.2.2
* Hive version : 3.1.3
* Hadoop version : 3.2.2
* Storage (HDFS/S3/GCS..) : COS on Tencent Cloud
* Running on Docker? (yes/no) : no
**Additional context**
Add any other context about the problem here.
**Stacktrace**
2024-01-08 17:07:28,934 WARN org.apache.flink.runtime.taskmanager.Task
[] - Source: split_monitor(table=[hudi_table_1], fields=[ts,
uuid, rider, driver, fare, city]) (1/1)#0
(3192a837c71d3aecaa11be4a8fadb46c_775ad0e5e90e3004c8cb1dade74c44c8_0_0)
switched from RUNNING to FAILED with failure cause:
org.apache.hudi.exception.HoodieException: Fail to get the dependent file
slice for a log file
at
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.getDependentFileSliceForLogFile(HoodieCDCExtractor.java:345)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.parseWriteStat(HoodieCDCExtractor.java:281)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.lambda$extractCDCFileSplits$1(HoodieCDCExtractor.java:133)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_322]
at
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.extractCDCFileSplits(HoodieCDCExtractor.java:128)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.source.IncrementalInputSplits.getCdcInputSplits(IncrementalInputSplits.java:462)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.source.IncrementalInputSplits.getIncInputSplits(IncrementalInputSplits.java:330)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.source.IncrementalInputSplits.inputSplits(IncrementalInputSplits.java:311)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.source.StreamReadMonitoringFunction.monitorDirAndForwardSplits(StreamReadMonitoringFunction.java:215)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.source.StreamReadMonitoringFunction.run(StreamReadMonitoringFunction.java:187)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
~[flink-dist-1.17.2.jar:1.17.2]
Caused by: java.lang.IllegalArgumentException: Can not create a Path from an
empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:172)
~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.fs.Path.<init>(Path.java:184)
~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.fs.Path.<init>(Path.java:129)
~[hadoop-common-3.2.2.jar:?]
at
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.getDependentFileSliceForLogFile(HoodieCDCExtractor.java:336)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
... 12 more
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]