ROOBALJINDAL opened a new issue, #12438: URL: https://github.com/apache/hudi/issues/12438
**Background:** We have created a implementation i.e. MssqlDebeziumSource, similar to MysqlDebeziumSource that we already have. We are using apicurio schema registry and aws MSK cluster for kafka. **Issue**: We create a table by ingesting csv using CsvDFSSource and it commits a commit file in .hoodie folder with schema details. When we run multistreamer job for the same table to process cdc events in kafka for upsert operation, if kafka topic is empty, hudi performs a empty commit without schema. Schema details are empty in the commit. **Use case:** When hive server goes down, we create a new one, tries to sync external table from s3 to restore tables, tables are created without columns since schema doesnt have columns in it. I tested it by changing source class to JsonKafkaSource, then performed CSV to CDC kafka transition, it worked fine. Empty commit was performed with schema details. **Code details:**  Here empty dataset is being created without schema, looks like it is intentionally done but can we make it configurable? **Expected behavior** : Schema should be present even for the empty commit for DebeziumSource. If not feasible, can we make it configurable? **Environment Description** * Hudi version : 14.0 * Spark version : 3.4.1 * Storage (HDFS/S3/GCS..) : s3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
