AshinGau opened a new issue, #10735:
URL: https://github.com/apache/hudi/issues/10735
**Describe the problem you faced**
1. When I create a hudi table in hive catalog, it works well in flink sql,
but can't be read by spark or flink hudi catalog. It seems that the hudi table
create by hive catalog has wrong schema and inputformat in hive metastore
showing by `SHOW CREATE TABLE`.
2. After I insert/update/delete a MOR table, the result of querying the
`_ro` table is the same as `_rt` table, but spark return the different results
when querying `_ro` table.
**To Reproduce**
Flink 1.17.2 + Hudi 0.14.1
Steps to reproduce the behavior:
1. Launch flink sql
```
export FLINK_VERSION=1.17
export HUDI_VERSION=0.14.1
./bin/sql-client.sh embedded -j
lib/hudi-flink${FLINK_VERSION}-bundle-${HUDI_VERSION}.jar shell
```
2. Create the hive catalog and hudi catalog
```
-- hive catlaog
create catalog hive with (
'type' = 'hive',
'default-database' = 'default',
'hive-conf-dir' ='/usr/local/service/hive/conf');
-- hudi catlaog
create catalog hudi with (
'type'='hudi',
'catalog.path' = 'hdfs://xxx/hudi_flink_hive_catalog',
'hive.conf.dir' = '/usr/local/service/hive/conf',
'mode'='hms');
```
3. Create a hudi table in hive catalog
Use the following sql to create a hudi table in hive catalog. flink 1.17 can
insert the partitioned table, but throws errors when querying. It works will in
flink 1.14. It maybe a bug in flink 1.17 + hudi 0.14.
```
use catalog hive;
use hudi_flink;
CREATE TABLE hive_ctl_table(
ts BIGINT,
uuid VARCHAR(40) PRIMARY KEY NOT ENFORCED,
rider VARCHAR(20),
driver VARCHAR(20),
fare DOUBLE,
city VARCHAR(20)
)
-- PARTITIONED BY (`city`) // flink 1.17 can insert the partitioned table,
but throws errors when querying. It works will in flink 1.14
WITH (
'connector' = 'hudi',
'path' = 'hdfs://xxx/hudi_flink.db/hive_ctl_table',
'table.type' = 'MERGE_ON_READ'
);
```
The hudi table created by hive catalog can be insert/update/delete/select by
flink sql, but throws errors when querying by spark or flink hudi catalog. It
seems that the table has wrong schema and inputformat in hive metastore showing
by `SHOW CREATE TABLE`. The table has no fields and stored as `TextInputFormat`.

4. Create a hudi table in hudi catalog
```
use catalog hudi;
use hudi_flink;
CREATE TABLE hudi_ctl_table(
ts BIGINT,
uuid VARCHAR(40) PRIMARY KEY NOT ENFORCED,
rider VARCHAR(20),
driver VARCHAR(20),
fare DOUBLE,
city VARCHAR(20)
)
PARTITIONED BY (`city`)
WITH (
'connector' = 'hudi',
'path' = 'hdfs://xxx/hudi_ctl_table',
'table.type' = 'MERGE_ON_READ'
);
```
After creating the table, use the insert/update/delete commands introduced
by https://hudi.apache.org/docs/flink-quick-start-guide#insert-data to produce
data. the result of querying the `_ro` table is the same as `_rt` table, but
spark return the different results when querying `_ro` table.

When I list the files in hudi path, there are only logs files and no base
files, the result should by empty when querying `_ro` table, but flink returns
the result which merges the insert/update/delete operations, just the same as
`_rt` table.

**Expected behavior**
1. The hudi table create by hive catalog just has wrong schema and
inputformat, it still can be parsed correctly by reading `.hoodie`. I am the
doris committer, after I find the bug, I have submit a
PR(https://github.com/apache/doris/pull/31181) to read `.hoodie` to get the
right schema. It works well when querying the hudi table created by hive
catalog.
2. I am not sure if the result of querying the `_ro` table is correct by
flink sql, but the result is inconsistent with spark, and the query result of
flink is likely to be incorrect.
**Environment Description**
* Hudi version : 0.14.1
* Flink version: 1.17.2
* Spark version : 3.2.1
* Hive version : 3.1.1
* Hadoop version : 3.2.2
* Storage (HDFS/S3/GCS..) : HDFS 3.2.2
* Running on Docker? (yes/no) : no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]