[GitHub] [hudi] punish-yh opened a new issue, #7221: [SUPPORT] Spark could not read Flink created table

GitBox Wed, 16 Nov 2022 02:44:36 -0800


punish-yh opened a new issue, #7221:
URL: https://github.com/apache/hudi/issues/7221


   **Describe the problem you faced**
   
   I used flink-sql create a hudi MOR table and the metadata store in Hive 
metastore.
   Then I use Spark with Hive metastroe to read the data with Flink inserted, 
but i got 0 lines data, and no error&exception log
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Flink sql create table sql:
   
   ```
   create catalog hive with(
   'type'='hudi',
   'mode'='hms',
   'hive.conf.dir'='/xxx/hive-conf'
   );
   
   CREATE TABLE if not exists hive.db.table_name(
    `id` BIGINT PRIMARY KEY NOT ENFORCED,
    `eid` STRING
   )WITH (
       'connector' = 'hudi',
       'path' = 'hdfs://nameservice/user/hudi/db/table_name',
       'table.type' = 'MERGE_ON_READ',
       'compaction.max_memory' = '512')
   
   ```
   
   2. select mysql source data insert into hudi table (it‘s worked, i use flink 
sql check result, it's correct)
   
   3. check table format in hive
   ```
   hive> show create table table_name;
   OK
   CREATE TABLE `table_name`(
   )
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
   LOCATION
     'hdfs://nameservice/user/hive/warehouse/db.db/table_name'
   TBLPROPERTIES (
     'flink.compaction.max_memory'='512',
     'flink.connector'='hudi',
     'flink.path'='hdfs://nameservice/user/hudi/db/table_name',
     'flink.schema.0.data-type'='BIGINT NOT NULL',
     'flink.schema.0.name'='id',
     'flink.schema.1.data-type'='VARCHAR(2147483647)',
     'flink.schema.1.name'='eid',
     'flink.schema.2.data-type'='VARCHAR(2147483647)',
     'flink.schema.2.name'='oid',
     'flink.schema.primary-key.columns'='id',
     'flink.schema.primary-key.name'='PK_3386',
     'flink.table.type'='MERGE_ON_READ',
     'transient_lastDdlTime'='1666582177')
   ```
   
   4. spark read code:
   ```
           SparkSession spark = SparkSession
                   .builder()
                   .config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
                   .config("spark.sql.extensions", 
"org.apache.spark.sql.hudi.HoodieSparkSessionExtension")
                   
.config("hive.input.format","org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat")
                   .config("hive.metastore.uris", "thrift://xxxx:9083")
                   .config("spark.sql.warehouse.dir", 
"hdfs://nemaservice/user/hudi/db/")
                   .enableHiveSupport()
                   .getOrCreate();
   
           spark.sql("show databases").show();
           spark.sql("use db").show();
           spark.sql("show tables").show();
           spark.sql("desc db.table_name").show();
           spark.sql("SELECT count(1) FROM db.table_name").show();
   ```
   
   **Expected behavior**
   
   I expected the spark code can count this table num.
   but just only "show databases","use db","show tables" can normal work
   I could get this table format and correct count
   
   actualy output:
   +--------+---------+-------+
   |col_name|data_type|comment|
   +--------+---------+-------+
   +--------+---------+-------+
   
   +--------+
   |count(1)|
   +--------+
   |       0|
   +--------+
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : 3.2.2
   
   * Flink version：1.14.5 
   
   * Hive version : 2.1.1-cdh6.3.2
   
   * Hadoop version : 3.0.0-cdh6.3.2
   
   * Storage (HDFS/S3/GCS..) : hdfs
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   should i must config 'hive_sync.enable'='true' in flink job then can use 
spark correct  read table ?
   
   **Stacktrace**
   
   flink and spark job execute success, no error ,  no exception
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] punish-yh opened a new issue, #7221: [SUPPORT] Spark could not read Flink created table

Reply via email to