Re: [I] [Feature] 1-1 mapping between paimon buckets and kafka partitions [paimon]

via GitHub Mon, 06 May 2024 19:56:03 -0700


eric666666 commented on issue #3249:
URL: https://github.com/apache/paimon/issues/3249#issuecomment-2097349520


   If you want 1->1 mapping，Paimon's bucket number should bigger than kafka 
partitions, and their should be shuffle by kafka partition id.
   I think paimon already can implement your thoughts.
   Here is a demo, you can define ddl like this
   Kafka source table:
   ```
   CREATE TABLE KafkaTable (
     `event_time` TIMESTAMP(3) METADATA FROM 'value.source.timestamp' VIRTUAL,  
-- from Debezium format
     `origin_table` STRING METADATA FROM 'value.source.table' VIRTUAL, -- from 
Debezium format
     `partition_id`  int METADATA FROM 'partition' VIRTUAL,  -- from Kafka 
connector
     `offset` BIGINT METADATA VIRTUAL,  -- from Kafka connector
     `user_id` BIGINT,
     `item_id` BIGINT,
     `behavior` STRING
   ) WITH (
     'connector' = 'kafka',
     'topic' = 'user_behavior',
     'properties.bootstrap.servers' = 'localhost:9092',
     'properties.group.id' = 'testGroup',
     'scan.startup.mode' = 'earliest-offset',
     'value.format' = 'debezium-json'
   );
   ```
   Paimon sink table:
   ```
   CREATE table if not exists sink_paimon_table
           WITH ('connector' = 'paimon',
           'bucket' = '3',  -- bucket number should bigger than kafka 
partitions 
           'bucket-key' = 'partition_id',     -- bucket key must be kafka 
partition_id
           'merge-engine' = 'deduplicate',
           'primary-key' = 'partition_id,offset' -- parimary key must be 
partition_id,offset
            )
           LIKE KafkaTable (EXCLUDING ALL)
   ```
   So kafka source table‘s data insert into paimon table will shuffle by kafka 
partition_id，partion_id is a int data type which hashcode equal itself, This 
pipeline model will let kafka partition record 1->1 to paimon bucket.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Feature] 1-1 mapping between paimon buckets and kafka partitions [paimon]

Reply via email to