eric666666 commented on issue #3249:
URL: https://github.com/apache/paimon/issues/3249#issuecomment-2097349520
If you want 1->1 mapping,Paimon's bucket number should bigger than kafka
partitions, and their should be shuffle by kafka partition id.
I think paimon already can implement your thoughts.
Here is a demo, you can define ddl like this
Kafka source table:
```
CREATE TABLE KafkaTable (
`event_time` TIMESTAMP(3) METADATA FROM 'value.source.timestamp' VIRTUAL,
-- from Debezium format
`origin_table` STRING METADATA FROM 'value.source.table' VIRTUAL, -- from
Debezium format
`partition_id` int METADATA FROM 'partition' VIRTUAL, -- from Kafka
connector
`offset` BIGINT METADATA VIRTUAL, -- from Kafka connector
`user_id` BIGINT,
`item_id` BIGINT,
`behavior` STRING
) WITH (
'connector' = 'kafka',
'topic' = 'user_behavior',
'properties.bootstrap.servers' = 'localhost:9092',
'properties.group.id' = 'testGroup',
'scan.startup.mode' = 'earliest-offset',
'value.format' = 'debezium-json'
);
```
Paimon sink table:
```
CREATE table if not exists sink_paimon_table
WITH ('connector' = 'paimon',
'bucket' = '3', -- bucket number should bigger than kafka
partitions
'bucket-key' = 'partition_id', -- bucket key must be kafka
partition_id
'merge-engine' = 'deduplicate',
'primary-key' = 'partition_id,offset' -- parimary key must be
partition_id,offset
)
LIKE KafkaTable (EXCLUDING ALL)
```
So kafka source table‘s data insert into paimon table will shuffle by kafka
partition_id,partion_id is a int data type which hashcode equal itself, This
pipeline model will let kafka partition record 1->1 to paimon bucket.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]