[GitHub] [hudi] imrewang commented on issue #9513: [SUPPORT]Index Bootstrap deleted some snapshot data that has been batch-inserted into Hudi ?

via GitHub Tue, 29 Aug 2023 01:24:53 -0700


imrewang commented on issue #9513:
URL: https://github.com/apache/hudi/issues/9513#issuecomment-1696992219


   # 一、snapshot 
   
   ### 1. Mysql is `snapshot data` synchronized to Hudi
   ```sql
   CREATE TABLE sink_scho
   (
   `name` STRING,
   `address` STRING,
   CONSTRAINT `PRIMARY` PRIMARY KEY (`name`) NOT ENFORCED
   )
    WITH (
     'connector' = 'hudi',
     'compaction.max_memory' = '1024',
     'write.task.max.size' = '2048',
     'write.merge.max_memory' = '1024',
     'write.operation' = 'insert',
     'index.bootstrap.enabled' = 'true',
     'path' = 'hdfs://132.178.4.190:8020/user/hive/warehouse/test.db/scho',
     'write.tasks' = '1',
      'hive_sync.enable' = 'true',
     'hive_sync.mode' = 'hms',
     'hive_sync.metastore.uris' = 'thrift://132.178.4.190:9083',
     'hive_sync.table' = 'scho',
     'hive_sync.db' = 'test',
     'hive_sync.username' = '',
     'hive_sync.password' = '' )
   ```
   
   # 二、incremental 
   
   ### 1. Flinkcdc captures Mysql change data
   
   ```java
           SourceFunction<String> mySqlSource = MySqlSource.<String>builder()
                   .hostname("123:23:23:34")
                   .port(123)
                   .username("ss")
                   .password(111)
                   .databaseList("db")
                   .tableList("tab")
                   .deserializer(new JsonDebeziumDeserializationSchema())
                   .startupOptions( StartupOptions.latest()))   //This is the 
incremental stage, so use latest
                   .build();
   ```
   
   > my incremental sql（that is, the sql that executes [ **delete**, update, 
insert ] above)
   > 
   > ```sql
   > CREATE TABLE sink_scho
   > (
   > `name` STRING,
   > `address` STRING,
   > CONSTRAINT `PRIMARY` PRIMARY KEY (`name`) NOT ENFORCED
   > )
   >  WITH (
   >   'connector' = 'hudi',
   >   'compaction.max_memory' = '1024',
   >   'write.task.max.size' = '2048',
   >   'write.merge.max_memory' = '1024',
   >   'write.operation' = 'upsert',
   >   'index.bootstrap.enabled' = 'true',
   >   'path' = 'hdfs://132.178.4.190:8020/user/hive/warehouse/test.db/scho',
   >   'write.tasks' = '1',
   >    'hive_sync.enable' = 'true',
   >   'hive_sync.mode' = 'hms',
   >   'hive_sync.metastore.uris' = 'thrift://132.178.4.190:9083',
   >   'hive_sync.table' = 'scho',
   >   'hive_sync.db' = 'test',
   >   'hive_sync.username' = '',
   >   'hive_sync.password' = '' )
   > ```
   
   
   
   
   ### 2. Test -D, +I, -U, +U
   
   > ### Are you sure row-level deletion of snapshot data is supported ？😖
   > The `snapshot data` of Hudi that is synchronized to the Hive table :
   > 
   > 
![image](https://user-images.githubusercontent.com/94045136/262917370-22172ea8-e878-4d85-b63b-b310d1298cf1.png)
   > 
   > Update a row of `snapshot data` `John update` and insert two rows of 
`incremental data` `11 111` `22 222` _UPDATE and INSERT succeed_ Now delete one 
`snapshot data` `Jane 456 Elm st`and one `incremental data` `11 111` :
   > 
   > 
![image](https://user-images.githubusercontent.com/94045136/262918684-3df76c86-ab19-4d9b-a3bd-d2a704280dd4.png)
   > 
   > The **`snapshot data`** `Jane 456 Elm st` deletion **failed**, the 
**`incremental data`** `11 111` deletion **succeeded :**
   > 
   > 
![image](https://user-images.githubusercontent.com/94045136/262921041-53a8f6ad-77e3-46da-afa5-db1db0900e37.png)
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] imrewang commented on issue #9513: [SUPPORT]Index Bootstrap deleted some snapshot data that has been batch-inserted into Hudi ?

Reply via email to