0x574C opened a new issue #8685:
URL: https://github.com/apache/incubator-doris/issues/8685


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   0.15.0-rc04
   
   ### What's Wrong?
   
   Create a hive table:
   ```
   create table target_table(id int,code string,p_day string) stored as parquet;
   
   insert into target_table 
values(1,'code1','2022-03-28'),(2,'code2','2022-03-28');
   
   alter table target_table add columns (content string);
   insert into target_table 
values(4,'code1','2022-03-28','content4'),(5,'code2','2022-03-28','content5');
   ```
   There are two parquet file in hdfs after twice insert data:
   ```
   [root@dev-master2 ~]# hdfs dfs -ls -h 
hdfs://dev-master2:8020/user/hive/warehouse/testdb.db/target_table
   Found 2 items
   -rwxrwx--x   3 hive hive        640 2022-03-28 09:49 
hdfs://dev-master2:8020/user/hive/warehouse/testdb.db/target_table/000000_0
   -rwxrwx--x   3 hive hive        803 2022-03-28 09:56 
hdfs://dev-master2:8020/user/hive/warehouse/testdb.db/target_table/000000_0_copy_1
   ```
   Create a doris table:
   ```
   create table doris_table
   (id int,content string,code string,p_day date) 
   partition by range(p_day) 
   (
       partition p20220328 values less than ("2022-03-29")
   )
    DISTRIBUTED BY HASH(code)
    PROPERTIES("replication_num" = "1");
   ```
   Load data from hive table:
   ```
   LOAD LABEL test.target_table_label 
   ( 
       DATA 
INFILE("hdfs://dev-master2:8020/user/hive/warehouse/testdb.db/target_table/*") 
       INTO TABLE `doris_table`
       FORMAT AS "parquet"
   )
   WITH BROKER broker_name ("username"="hdfs", "password"="hdfs")
   ```
   <b>The load job failed with follow error message because the `content` 
column info not in parquet file `000000_0`.</b>
   ```
   type:LOAD_RUN_FAIL; msg:errCode = 2, detailMessage = file: 
hdfs://dev-master2:8020/user/hive/warehouse/testdb.db/target_table/000000_0 
error:Invalid Column Name:content
   ```
   Merge two parquet file:
   ```
   insert overwrite table target_table select * from target_table;
   ```
   Rerun the load task, and the task finished
   
   ### What You Expected?
   
   Before merging two parquet files, the load task should run successfully, and 
the columns not in the parquet file are automatically set to null.
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to