nicholasxu opened a new issue, #10539:
URL: https://github.com/apache/hudi/issues/10539
**Describe the problem you faced**
When using MOR table which stream changelog.enabled, cdc.enabled and
read.streaming.enabled by flink sql, got an Unexpected cdc file split infer
case: LOG_FILE Exception
**To Reproduce**
Steps to reproduce the behavior:
1.Set up a Flink sql client and set config as follows
```
set execution.checkpointing.interval='10s';
set state.checkpoints.dir='hdfs://HDFS-TEST/flink/checkpoints/nick_test_cdc';
set
execution.checkpointing.externalized-checkpoint-retention='RETAIN_ON_CANCELLATION';
set execution.checkpointing.timeout=600000;
set state.checkpoints.num-retained=3;
set state.savepoints.dir = 'hdfs://HDFS-TES/flink/savepoints/nick_test_cdc';
```
2.Create a MySQL CDC table in default catalog:
```
CREATE TABLE `DBS` (
`DB_ID` bigint,
`DESC` varchar(4000),
`DB_LOCATION_URI` varchar(4000),
`NAME` varchar(128),
`OWNER_NAME` varchar(128),
`OWNER_TYPE` varchar(10),
`CTLG_NAME` varchar(256),
PRIMARY KEY (`DB_ID`) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = '***',
'port' = '3306',
'username' = 'root',
'password' = '***',
'database-name' = 'hivedb',
'table-name' = 'DBS'
);
```
3.Create a HUDI catalog and use it:
```
CREATE CATALOG hudi_hive_catalog
WITH (
'type'='hudi',
'catalog.path' = 'cosn://bdp-xxx-yyy/user/hive/warehouse',
'hive.conf.dir' = '/path_to_hive',
'mode'='hms',
'table.external' = 'true',
'default-database' = 'hudi_default'
);
use CATALOG hudi_hive_catalog;
```
4.Create a HUDI table and insert data
```
CREATE TABLE `DBS_TEST_CDC` (
`DB_ID` bigint,
`DESC` varchar(4000),
`DB_LOCATION_URI` varchar(4000),
`NAME` varchar(128),
`OWNER_NAME` varchar(128),
`OWNER_TYPE` varchar(10),
`CTLG_NAME` varchar(256),
PRIMARY KEY (`DB_ID`) NOT ENFORCED
) WITH (
'connector' = 'hudi',
'path' =
'cosn://bdp-xxx-yyy/user/hive/warehouse/hudi_default.db/DBS_TEST_CDC',
'table.type' = 'MERGE_ON_READ',
'changelog.enabled' = 'true',
'cdc.enabled' = 'true',
'read.streaming.enabled'= 'true',
'read.streaming.check-interval'= '3',
'read.start-commit' = 'earliest',
'compaction.async.enabled' = 'true',
'compaction.delta_commits' = '3',
'hive_sync.enable' = 'true',
'hive_sync.mode' = 'hms',
'hive_sync.metastore.uris' = 'thrift://xxx:9083'
);
insert into hudi_hive_catalog.hudi_default.DBS_TEST_CDC
select * from default_catalog.default_database.DBS
```
5.Query HUDI table
```
set sql-client.execution.result-mode = tableau;
select * from hudi_hive_catalog.hudi_default.DBS_TEST_CDC_TARGET;
```

6.Got flink exception
```(a835931b17a669ee458c3d6dd2e90fbc_b37a4b6cd3155ce46bdbdcbd40810486_0_11)
switched from RUNNING to FAILED with failure cause:
java.lang.AssertionError: Unexpected cdc file split infer case: LOG_FILE
at
org.apache.hudi.table.format.cdc.CdcInputFormat.getRecordIterator(CdcInputFormat.java:190)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.table.format.cdc.CdcInputFormat.getRecordIteratorV2(CdcInputFormat.java:150)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.table.format.cdc.CdcInputFormat.lambda$initIterator$0(CdcInputFormat.java:104)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.table.format.cdc.CdcInputFormat$CdcFileSplitsIterator.hasNext(CdcInputFormat.java:224)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:269)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.source.StreamReadOperator.consumeAsMiniBatch(StreamReadOperator.java:194)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.hudi.source.StreamReadOperator.processSplits(StreamReadOperator.java:174)
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:398)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:383)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:345)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:229)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
~[flink-dist-1.17.2.jar:1.17.2]
at
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
[flink-dist-1.17.2.jar:1.17.2]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
[flink-dist-1.17.2.jar:1.17.2]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
[flink-dist-1.17.2.jar:1.17.2]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
```
7. The hudi directory list as follows:

**Expected behavior**
I wonder if there is an underlying bug?
**Environment Description**
* Hudi version : 0.14.1
* Spark version :
* Hive version : 3.1.1
* Hadoop version : 3.2.2
* Storage (HDFS/S3/GCS..) : COS on Tencent Cloud
* Running on Docker? (yes/no) :no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]