eg-kazakov opened a new issue, #10258: URL: https://github.com/apache/hudi/issues/10258
**Describe the problem you faced** When spark tries to load a hoodie dataframe from datasource with next hoodie timeline setup: ``` ./.hoodie: total 15392 drwxr-xr-x 9 ekazakov staff 288 Dec 6 20:35 . drwxr-xr-x 5 ekazakov staff 160 Dec 6 23:00 .. -rw-r--r-- 1 ekazakov staff 257944 Dec 6 13:50 20231206064834739.replacecommit -rw-r--r-- 1 ekazakov staff 123 Dec 6 13:50 20231206064834739.replacecommit.inflight -rw-r--r-- 1 ekazakov staff 2522 Dec 6 13:48 20231206064834739.replacecommit.requested -rw-r--r-- 1 ekazakov staff 3350228 Dec 6 14:06 20231206065013087.clean -rw-r--r-- 1 ekazakov staff 2128543 Dec 6 14:01 20231206065013087.clean.inflight -rw-r--r-- 1 ekazakov staff 2128543 Dec 6 14:01 20231206065013087.clean.requested -rw-r--r-- 1 ekazakov staff 885 Nov 23 10:44 hoodie.properties ``` I am getting: org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema: `_hoodie_commit_seqno`, `_hoodie_commit_time`, `_hoodie_file_name`, `_hoodie_partition_path`, `_hoodie_record_key` and it fails to load existed data **To Reproduce** Steps to reproduce the behavior: 1. I've attached my hoodie table structure here: [google drive](https://drive.google.com/drive/folders/13L21rtKr4Ov4qFg-_0C3ZC6gifb2JeZM?usp=sharing) 2. Here is code for reproducing: ``` df = sparkSession.sqlContext.read // infer trip schema .format("hudi") .option(QUERY_TYPE.key, QUERY_TYPE_INCREMENTAL_OPT_VAL) .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "/movement_type=*/{year=2023/month=12/day=05,year=2023/month=12/day=04}/provider=*/qk=*/*.parquet") .option(BEGIN_INSTANTTIME.key, "20231205063117") .load("hudi-output-duplicated-columns/") ``` **Expected behavior** I am expecting loaded data from hudi table without failures **Environment Description** * Hudi version : 0.13.0 * Spark version : 3.3.2 **Additional context** During debug I've noticed that columns got duplicated: <img width="981" alt="Снимок экрана 2023-12-06 в 22 08 03" src="https://github.com/apache/hudi/assets/16684232/fc385fe3-2ed9-4ae1-b83e-fe55d284133f"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
